june09b.indd Leah Broaddus and Pam Hackbart-Dean A tradition of access Creating a diversity news index using OCLC’s CONTENTdm OCLC’s CONTENTdm digital collection management software has been used as a platform for many interesting and timely archival projects. Morris Library Special Col­ lections Research Center at Southern Illinois University­Carbondale (SIUC) has successfully used this platform to migrate and host digitized archival photograph collections. The center ac­ quired administrative access to CONTENTdm through the C o n s o r t i u m of Academic and Research Libraries in Il­ linois (CARLI). Inspired by presentations a t t h e M i d ­ west Archives C o n f e r e n c e a n d S o c i e t y of American A r c h i v i s t s w o r k s h o p s , we became in­ creasingly in­ terested in ex­ ploring CON­ TENTdm’s use with different formats, such as digitized oral histories and transcribed news articles. We hoped to use it to create a campus news index documenting SIUC’s history of diversity—par­ ticularly its commitment to ethnic and racial diversity. Such a project might also promote interest in other un­indexed student sources. Our challenge was to find a campus col­ laborator with a vested interest in the project Screenshot of the Daily Egyptian Diversity News Index public entry page. to provide funding for the extensive hours needed to populate a meaningful, searchable index. We hope that this ongoing project will serve as a catalyst for others on our campus, and perhaps beyond. How the project came to be Initial impetus to create a diversity news index came about by happy accident. While gath­ ering sources for an alumni reunion dis­ play, the uni­ versity archi­ vist came upon some diversity articles from the late 1950s (by scrolling through the u n ­ i n d e x e d microfilm of t h e s t u d e n t n e w s p a p e r, D a i l y E g y p ­ t i a n ) a n d sent them to the associate chancellor for diversity. The associate chancellor is an alum­ nus and former basketball star, so the archivist included a few highlighted articles relating Leah Broaddus is university archivist, e-mail: lbroaddu@ lib.siu.edu, and Pam Hackbart-Dean is the director of Special Collections Research Center, e-mail: phdean@ lib.siu.edu, at Southern Illinois University-Carbondale © 2009 Leah Broaddus and Pam Hackbart-Dean 352C&RL News June 2009 http:lib.siu.edu http:lib.siu.edu to his athletic performances. A friendly note was attached, hinting at the wealth of campus historical material that must be stored in these un­indexed papers. A few weeks later the as­ sociate chancellor telephoned to ask what it would take to initiate a project of fi nding more articles on diversity at SIUC and republish them for wider accessibility and research. At the end of the conversation, the associate chancellor suggested that a formal proposal be written and submitted to him for consideration. Developing the proposal We dropped what we were working on that afternoon and sat down to formulate a plan. Using a template the director had drawn up for another project, we took a survey of the microfilmed material, checked some dates for copyright concerns regarding digitization, telephoned the library programmer to con­ sult about two online delivery options, and drew up a brief proposal, as requested. The proposal included 1) a description of how the goal of the project was relevant to the campus and community, 2) the software we would use for online delivery, and 3) bulleted and diagrammed estimations of the student hours needed and the amount of diversity­related materials we thought we might fi nd within predetermined year spans. It is important to note that preserved on microfilm in the Morris Library Special Col­ lections Research Center, the Daily Egyptian contains a wealth of information about student life dating back to 1869. The early articles and images had never been searchable online. Our project aimed to make them as widely avail­ able as possible to SIUC students, faculty, ad­ ministration and the general public, providing a dynamic historical and academic resource for many years to come. Therefore, the plan for this ongoing project entails surveying the microfilm, locating and scanning relevant articles, and creating searchable transcripts for Web access. These digital files are further processed with optical character recognition software (OCR) to create full­text versions of the content. The resulting digitized material is searchable online. Student is hired, trained to do the work The associate chancellor agreed to fund a stu­ dent position to work on the project. Special Collections provided the job description, the associate chancellor interviewed and identifi ed a student whom he felt could do the work, and then he introduced the student to us for approval. Carefully writing the job descrip­ tion proved very important when a project involves control that expands beyond our offices. Because we foresaw that the student would be working with just one collection, on one project, and would have to be trained for such specific work, we preferred to advertise for broader learning traits and interests rather than seeking students who already had experi­ ence with digital projects. The job requirements included: • attention to detail and accuracy; • ability to exercise independent judgment; • excellent written and oral communica­ tion skills; • ability to take instruction and follow directions; • hardworking, meticulous, accountable and deadline­oriented; and • general computer skills. It was very simple to train the student. First he spent several weeks identifying microfi lm articles, printing them off, and marking them with the source and date. Next, he patiently participated in several major false starts as we experimented with the scanning process. Fol­ lowing that, we trained him to edit the images in preparation for running OCR software. Then he worked on the detailed process of editing badly garbled text. He then learned to create and enter metadata describing the articles, and to upload them to the online database in CONTENTdm. Obstacles While the project is a success because the goal is being met, there have been obstacles along the way, mainly with scanning for OCR. Scanning for OCR One can expect problems when scanning from microfi lm to a digital format without the June 2009 353 C&RL News Screenshot of the CONTENTdm data entry interface. proper equipment. We intended to run OCR software on the scans. We found that scans that did justice to the images did not do as well for text, and vice versa. Scanning the articles from the microfilm reader printouts was one option, but some of the older microfi lm had been created after the articles had already faded. In many cases, the microfi lm itself had also deteriorated through use, with scratches running through the text. The quality of the printouts from the microfilm reader was low, with very few contrast and lighting adjustment options. Many of the articles would have been too large to print on a single page and required that the student apply a zoom lens view that made the text characters illegibly small on the printouts. Finally, the quality of the images was further reduced by subsequent digital scanning. The resulting OCR from these digital images was easy to harvest rapidly, but the corrupt and garbled transcripts disproportionately slowed the editing process. The associate chancellor was revisited to discuss the possibility of out­ sourcing the digitization, but due to funding limitations, sending the materials off campus was not an option at that point in the project. Microfilm versus microfi che Meanwhile the Micrographics Department on campus informed us that they had microfiche negatives of the Daily Egyptian on file, which were in better condition than the film. Using copies of some of the fiche, we experimented using the newest high­resolution library scanner, but even the highest resolution settings yielded dismally blurry results. The fi nal solution was that we were permit­ ted to train our student assistant to use the Micrographics microfi che digital scanner. The Micrographics Department asked that we work during nonpeak hours and stand ready to cede to anyone who needed the machine. This system worked, with a few minor glitches. The scans we made were at 200 dpi initially, zoomed in as close to the borders of each article as possible at text­readable size. The quality of any illustrations embedded in the text was low, both because of the contrast settings that yielded optimal OCR results, and because of the quality of the original microfi lm. However we felt that since the purpose of the project was primarily to index and describe these materials in a text­searchable manner, we would overlook the lesser quality of the article illustrations in the hopes that patrons could request better images from us directly, if needed. Working with the software Our consortia­provided subscription to CON­ TENTdm did not include access to the prod­ uct’s built­in PDF OCR component. There was, however, the option of running external OCR software to create and edit text documents on our own. CONTENTdm simply required that we save the text and image documents with the same file names (different extensions) and upload those simultaneously using the “Acqui­ sition Station” software instead of working via the Internet entry­interface. To get the database ready for the data, our library programmer set up metadata fi elds, including one designated for the transcript. 354C&RL News June 2009 He trained us in mapping the fields to draw data from the files automatically and associate it into single files for each article on upload. He also provided troubleshooting when the uploads failed. The total process took several weeks of trials before running smoothly, at which point we taught it to the student, who had been occupied with scanning during that period. Some of the image files we created would not upload, for reasons we never were able to identify. When several images failed in a batch, the program would freeze and all of the batch would fail. For the first several batches, we had to click on each individual image to import it to the acqui­ sition station (setting aside t h o s e t h a t f a i l e d i n a separate fold­ er). Happily, a s o u r s c a n ­ n i n g p r o c e s s became more consistent, sub­ sequent batch­ e s o f i m a g e s worked more smoothly. Lessons learned The following quick tips represent some of the lessons we learned, and may be helpful to librarians contemplating small, peripheral projects involving extra­library funding: 1. First, learn the type of role an administra­ tor wants to have in a project. Do not assume that it is the same role you would want. 2. Work quickly. When someone asks for a proposal, their enthusiasm is immediate, and if you delay, your opportunity may be absorbed or replaced by some other idea. If an administrator asks you for something, get it to him or her as quickly as you can, and then wait patiently before following up. 3. Do not mistake a campus project for a national grant—keep it simple. Be succinct, Scan of an announcement in the May 2, 1961, Daily Egyptian. “Dick Gregory, Dizzy Gallespie Here Thursday.” knowledgeable, and direct. Have several solutions in mind, and be clear about what is required to transform a project proposal into a reality. 4. If funding comes from sources within the university, it may be helpful to give the decision maker a choice of funding levels (for this amount, we can do A, for more we can do B, etc). 5. Expect glitches at the beginning of each stage of the project, but do not let that stop you from getting started. Adjustments will need to be made along the way when work­ ing with OCR, historical data formats, and even the most user­friendly database systems. Thanks to a clear vision, and by being open, fl exible a n d a d a p t ­ able, we have l a u n c h e d a collaborative project that benefits stu­ dents, faculty, and the ad­ ministration of our univer­ sity. Future dreams for the Daily Egyptian Diversity News Index include: • wide use by students, faculty, and admin­ istration, as well as the community at large; • feedback on improving this resource; and • expanding the index contents and add­ ing more formats, to include related photo­ graphs, manuscripts, and offi cial records. This project has already generated a lot of positive buzz. Ultimately, we have learned that by saying yes to possibilities, tapping each oth­ er’s knowledge (such as expertise in scanning, CONTENTdm, etc.), embracing learning and putting snapshots of our institutional history in a position to promote ourselves, we can create something positive and satisfying for everyone. Please visit our successful Daily Egyptian Di­ versity News Index project at www.lib.siu.edu /diversitycollection. June 2009 355 C&RL News http:www.lib.siu.edu