The Code4Lib Journal – LibALERTS: An author-level subscription system Mission Editorial Committee Process and Structure Code4Lib Issue 18, 2012-10-03 LibALERTS: An author-level subscription system Patron requests for the ability to subscribe to their favorite authors so they could receive notifications when new titles are released, presented an opportunity for Westlake Porter Public Library to learn, to build, and to engage with patrons on the development of a new service. The library’s libALERTS service, which launched in June 2012, was the culmination of a process that involved the development of a Drupal-based website augmented with a hand-coded preprocess interface that addressed critical concerns for the effectiveness of the service. By Matt Weaver Introduction: A “Roll Your Own” approach This paper discusses the development process and launch of Westlake Porter Public Library’s libALERTS service, an author-subscription service. In 2011, a couple of patrons at Westlake Porter Public Library had asked about such a service. Apparently one of our vendors, Dear Reader, had been working on subject-level notifications. I am strident in my beliefs that libraries need to solve as many of these problems on our own as we can, so I investigated what was possible. During the process, I did find one vendor, engagedpatrons.org, which provides an alert service at a reasonable price. Westlake Porter Public Library, as a Sirsi customer, used to have the “Tell me when” functionality in the OPAC that would send notifications to patrons when new items came out based on their past checkouts. Since the list could not be refined to eliminate categories that patrons did not want notifications for, and some other issues, our systems administrator turned off the service, an action that was hardly noticed by our customers. Having been given the freedom to explore what was possible with the resources at my disposal, the savings of a few hundred dollars per year in fees to a vendor might not be seen as worth the investment of time and effort; but the experience that the library has in launching and sustaining a service on its own, from the coding experience, to the practice of involving the patrons who will use the service in its development are powerful motivators. Also, such a service would provide test data that we could use to better understand our patrons’ reading choices. Excited at the possibility of further in-house development of readers’ advisory solutions, I decided to proceed with a Drupal-based solution after initial testing proved the viability of the project. The development process took around a year, with the project getting put on the back burner for months at a time. If ongoing efforts to improve the technology, normalize code and bring the modules into a neat package are successful, libALERTS could be shared easily with other libraries. Minimum Viable Product The goal for the initial development was to understand what tools I could find, and what skills I needed to learn, in order to get a pilot project up and running. This minimum viable product [1] would be used for initial testing to determine whether an actual service could be developed. If the basic service functioned in initial testing, a fuller version would be developed for a testing program using a group of patrons. Accepting constraints: A development team of 1, a budget of 0 Normally it would be customary to seek input from patrons about the value of such a service; but the most important question for me was whether it was possible. I had no budget for the project. If, in the process of seeking patron interest in such a project, we engendered an expectation that it would be forthcoming, that could be problematic. In order to provide such a service, I would need publishing data. Running reports from the ILS could prove time consuming on an ongoing basis. Also, due to budget cuts in recent years, all departments have been running lean. If possible, I wanted to do everything I could to make the service fit in our operations as neatly as possible, so I asked the library’s cataloger and the Acquisitions Specialist about the procedures for ordering. The latter explained that she downloads shell MARC records from publishers that are loaded into the ILS after an order is completed, and then those MARC records were simply deleted from her computer. The cataloger arranged for her to download the MARC files into a shared folder using a desktop shortcut, and instead of deleting the files, she would leave them there for me to process for libALERTS. She also offered to add suffixes to file names to indicate what Dewey category, format, and age group the files were for. This process didn’t change her download process at all. Based on the expectation that the service would appeal to fans of prolific, mainstream authors, the library’s systems administrator ran a set of reports of bestselling titles in MARC format, which I uploaded in order to populate the site with a good representation of popular authors. Also, for a time, I uploaded a large number of MARC records without considering whether patrons would want to subscribe the authors in those records in order to receive the notifications when their new titles were published.. A maintenance task in the future will be to delete content/authors who have been in the system for a certain amount of time and have no subscriptions attached to them. Drupal The library’s public website and intranet are built on the popular content management system Drupal (version 6) in multi-site setups, and both sites feature subsites. libALERTS functions as a subsite of the main website I explored available modules and quickly found that I could piece together suites of contributed modules into a mechanism for patrons to subscribe to authors and receive email notifications when new content by those authors was added to the site. libALERTS are content types built with the powerful Content Creation Kit (CCK) module, and feature custom fields for a book’s ISBN, which is applied as a variable in a template to complete an img element’s src attribute that displays Syndetics cover art for each title (See Example 1). I created a custom template to “theme”—drupalspeak for changing the layout—the libALERT content type. Figure 1. node-libalert.tpl.php and sample node. Critical Modules Notifications — provides subscription and notification functionality. The Taxonomy Notifications module, which is part of the Notifications group of modules, allows users to subscribe to taxonomy terms, which is how I stored author names. SMS Email Gateway/SMS Framework — enables the transmission of text messages using email gateways. SMS Email Gateway used to be part of the SMS Framework but is now its own project.[2] However, the SMS Framework module is better supported and is still under development. MARC module — allows Drupal nodes to be created by importing MARC record data. Upon finding another suite of modules that would provide the ability to send text messages, and discovering that the Ohio Public Library Information Network provides libraries with SMS capability for free, I was eager to include text messages as a possible mode for alert delivery. To get text messaging to work, however, required hacking the SMS Email Gateway module. Initially, this was for diagnostic purposes, in hopes it would give me insight to a fix for the module. Essentially, I hacked the module to force the gateway address for the carrier. This hack appears in a thread that I started on the Drupal forum [3]: In sms_email_gateway.module, Line 194, I changed to I have not tested the patches in the thread noted above, nor explored the latest version of the SMS Framework to see if it will handle text messaging. This hack remains a part of the service, but redressing this is a goal of future improvements to the site. Simulating taxonomy search Drupal’s search system does not search taxonomy terms. Given that these authors’ names are stored as a taxonomy terms, I needed a mechanism for search. The best that I could do was to use a setting in the Views module. I exposed the taxonomy term filter, in effect creating a search box. Then, I set “exposed form in block” to “Yes” creating the search block. Figure 2. Exposed taxonomy term filter.. Proof of concept: staff testing Staff testing began with an initial group of subjects from the library’s Technology Team, which I lead. I had them sign up, add authors and subscribe to them. Those with cell phones who were willing to receive texts as part of the testing process were enlisted. I found as many types of cell phones and smart phone operating systems as possible. After this initial testing of the core alert-transmission system, I expanded staff testing to any employee willing to participate. Testing focused on the basic functionality of the site: subscribing to authors, and transmission of alerts triggered by the uploading of data from MARC records into Drupal nodes. Core functionality, for both email and text message recipients, proved successful. Post proof of concept: the survey The library was planning to run a series of short surveys in the near future: out of fear of survey fatigue, I was permitted to run a short survey on the website and as a handout, for up to two weeks. The survey aimed to capture interest among readers in such a service and which authors respondents would like to subscribe to. While the number of recipients was only 50, interest in the service was very high (see below). The authors that recipients would subscribe to read like a bestsellers standing order. Not every reader follows authors so closely, but if a core of voracious readers would be satisfied with this service, libALERTS would certainly hit that sweet spot of mainstream fiction. Forty out of fifty survey recipients added a total of 234 authors for an average of 5.85 per respondent. But several respondents also added phrases like “plus many more,” or “hundreds of others” in their list of authors. Questions Responses How likely would you be to use this service? Somewhat likely 6 Very likely 8 Would definitely use it 36 How many books do you read a year? 3 or fewer books per year 1 4 to 6 books per year 1 7-9 books per year 2 10-12 books per year 2 more than 12 books per year 43 Who are your favorite authors? Left Blank 10 User entered value 40 Average submission length in words (excluding blanks) 13.68 Table 1. Survey Responses. Dealing with the data both from the staff testing phase and the survey revealed some major issues that I would have to address before turning this into a real service. A Pre-process interface After the concept of the site had been proven, the initial development process revealed a couple of major concerns that had to be addressed for the actual service to work. First, for an alert to be triggered, the author name that appears in a MARC record must match the author name in the libALERTS site exactly. If a patron would fail to receive an alert because of a dropped middle initial, then the service fails. The MARC records that we receive from publishers can include variations in a name form, or misspellings. Also, authors may have very similar names: David A. Adler and David D. Adler both write children’s books, but are different authors. I found nothing in Drupal that would allow comparison of MARC records to the contents of the database. The second concern was the size of the site. libALERTS was never intended to duplicate all content in the catalog, but all MARC records that the library receives from publishers would have to be processed in order to catch every possible subscription. Surplus content would make it harder for users to find desired authors. So, I only wanted to upload MARC records for authors to whom at least one patron had subscribed. As libALERTS is designed to be as automated as possible, I needed to solve these problems programmatically. File_MARC The preprocess interface reads and writes MARC records using the File_MARC library, which is installed via the PEAR Package Manager as part of a server’s PHP installation. This library can be used with Drupal’s MARC module, which is discussed later in detail. Selectively adding content Because libALERTS has the means for users to add authors not included in the site, and the site was pre-populated with hundreds of authors, I decided that the MARC records should be analyzed so that only records with authors for which a subscription was attached should be included. The example below shows author values gathered by a SQL query of the database and placed in an array. Another array is created from authors in the MARC record, and each item in that array is compared against the array of existing authors: /* *Get array of authors in libALERTS that have subscriptions associated with them */ //Get term names for authors that have subscriptions attached. $subauth = mysql_query("SELECT DISTINCT tid, name FROM term_data INNER JOIN notifications_fields ON notifications_fields.value = term_data.tid where notifications_fields.field = 'tid';"); /* * Displays all authors in a marc file -- needs form input to choose file */ $newbooks = new File_MARC(‘[path to directory]' . $filename); while ($record = $newbooks->next()) { $objkey++; $newauthors = $record->getFields('100'); if ($newauthors) { foreach ($newauthors as $field) { $author = substr($field,9); mysql_data_seek($subauth,0); while($rows = mysql_fetch_array($subauth, MYSQL_ASSOC)){ $pairs[] = implode('|',$rows); foreach($pairs as $pair){ $final = explode('|',$pair); } if(in_array($author,$rows)){ $newrecords[] = $objarray[$objkey]; echo 'Match found for ' . $author . ''; echo ''; } The Levenshtein distance The process of comparing two strings– the author names as they appear in the MARC record, and the existing name in the libALERTS database– involved querying the author names in the MARC records and creating an array of author names in the database to which a site member had subscribed. In php, the levenshtein function calculates the “the minimal number of characters you have to replace, insert or delete to transform” the first string into the second string. This measurement is the Levenshtein distance, or the edit distance. [4] In the following example, author names from MARC records for which there is not an exact match in the database are compared against the database with the levenshtein function to see if there is anything close. The code below successfully detects name forms in MARC records that are off by as few as one character when compared to the form of the name in the database. /* * Calculate the edit distance with levenshtein() * Example from http://us2.php.net/manual/en/function.levenshtein.php */ //evaluate Edit distance // no shortest distance found, yet $shortest = -1; // calculate the distance between the input word, // and the current word $lev = levenshtein($author, $row); // check for an exact match if ($lev == 0) { // closest word is this one (exact match) $closest = $row; $shortest = 0; $newrecords[] = $objarray[$objkey]; break; } // if this distance is less than the next found shortest // distance, OR if a next shortest word has not yet been found if ($lev <= $shortest || $shortest < 0) { // set the closest match, and shortest distance $closest = $row; $shortest = $lev; } if ($shortest > 0 && $shortest < 5) { echo '