The Code4Lib Journal – ‡biblios: An Open Source Cataloging Editor Mission Editorial Committee Process and Structure Code4Lib Issue 5, 2008-12-15 ‡biblios: An Open Source Cataloging Editor ‡biblios is an open source cataloging editor designed to allow libraries to perform copy and original cataloging in a web based environment. ‡biblios allows users to search for, edit, and save bibliographic records in the MARC21/MARCXML formats. It also allows users to send records directly to integrated library systems such as the Koha ILS. Where most MARC editors are part of an integrated library system (and therefore require logging in), ‡biblios allows users to catalog with an open source standalone system available anywhere via a web browser. Unlike other cataloging editors, it offers an attractive user interface for searching, saving and editing cataloging records. This article describes the system architecture and design of ‡biblios. by Chris Catalfo History ‡biblios was a project to develop a web based cataloging editor, suitable for use with the Koha ILS or with other ILSs, that I put forward to LibLime as a proposal for the 2007 Google Summer of Code. When it was accepted, I served as lead programmer for the project and Joshua Ferraro, CEO at LibLime, mentored the work as the system architect. At the end of the Summer of Code, we had a minimally functional web application able to search several Z39.50 targets and edit records in an integrated MARC editor. The name, ‡biblios, pronounced “biblios,” has no special meaning, although it obviously invokes the idea of “books.” The double-dagger symbol is commonly used as a subfield delimiter in cataloging, and forms an ideal logo for ‡biblios because the symbol can be represented in both graphical and textual formats. Inspiration for this use of a symbol in a logo was drawn from the Ümlaut and *Asterisk projects. Since the Google Summer of Code, ‡biblios has continued to be developed at LibLime, Inc. In the past year of work, we have made a number of enhancements and changes. We switched from using a Perl CGI script to search Z39.50 targets to using the PazPar2 metasearch tool developed by IndexData. This has greatly improved the searching experience. It has also enabled us to show search facets, which PazPar2 provides as part of its webservice. Besides these changes, we also added the capability to define macros to run on records, using JavaScript to manipulate MARCXML records. Systems Architecture The system architecture of ‡biblios consists of: a rich internet application using the ExtJS toolkit and Google Gears which provides the user interface a set of CGI scripts providing server side services, and the PazPar2 search middleware The rich internet application provides the front end user interface and allows the user to search, select, and edit records. Google Gears is a web browser extension which allows web applications to store data in an SQLite database on the user’s computer. This allows ‡biblios to save records to the user’s computer. The CGI scripts provide back end functionality (such as exporting records), while PazPar2 allows searching multiple Z39.50 targets. Figure 1: System Architecture Front end The front end is comprised of a single web page, developed using the ExtJS [1] JavaScript toolkit. The user never navigates away from this single page while using ‡biblios; in this sense it is like a desktop application. ExtJS provides a very rich set of widgets to use in constructing web applications. ‡biblios uses most of these widgets in its user interface, where each area utilizes an ExtJS Panel. These panels are frames on the screen which can contain other widgets such as grids or sub panels. There are several TreePanels for interacting with hierarchical data such as Z39.50 search targets or folders containing saved records. TreePanels can be loaded dynamically via AJAX (as they are when presenting search facets) or they can be loaded from the Google Gears database (as when displaying folders of saved records). There are a number of GridPanels for interacting with tabular data (such as search results or lists of search targets). The GridPanels provide for handling record selection via mouse clicks or arrowing down, as well as for sorting columns of data. Finally ‡biblios uses several dialog windows for actions such as uploading files. The user interface is designed to emulate a web-based email client. The left-hand sidebar offers a selection of ‘resources’ (Z39.50 search targets, folders to save records into, and records to create). In the center panel there is generally a grid displaying either search results or records in a folder. When editing a record, the screen changes to show the marc editor. Figure 2: ‡biblios UI [View full-size image] ‡biblios also uses ExtJS’s GridPanels for interacting with bibliographic records and other data stored in the Google Gears database, as well as to view search results. These provide a user interface for interacting with tabular data; they also provide for paging buttons, selecting of records, toolbars, and loading indicators. ExtJS grids can be configured to work with various data sources: simple JavaScript arrays of data, external data sources (with results returned by AJAX calls), or custom data sources. ‡biblios makes use of a custom data store developed for Google Gears to allow for viewing bibliographic records and “search” and “send” targets. It also feeds the Z39.50 search results returned by PazPar2 into an ExJS data store for viewing. Figure 3: ‡biblios ExtJS Widgets [View full-size image] Each action performed in ‡biblios that needs server-side processing uses AJAX requests to send data from ‡biblios and to receive data from the server. In this way there is no need to wait for page reloads to complete actions. Figure 4: ‡biblios Data Flow [View full-size image] Use of Google Gears ‡biblios makes use of the Google Gears [2] browser plugin for storing of bibliographic records. Google Gears allows the browser to store data from web applications in an SQLite database available to the web application. Each site that makes use of the plugin may create a database and modify that database, but may not modify the databases of other sites. Google Gears was chosen as a means to allow users to view and manipulate records they have previously saved from search results or to view records they have created. When returning to ‡biblios, users are able to view these previously saved records. Although saving records to the user’s computer (in the form of the embedded SQLite database) is handy, in the future this code may be separated out into a plugin for allowing offline access to records. At times it has proven difficult to manage data in users’ Gears databases because the database may contain stale data, such as configuration data, or the web application may be expecting more recent data. This would also allow browsers which aren’t supported by Google Gears to access the site. Use of PazPar2 search middleware PazPar2 [3] is a server developed by IndexData which allows searching multiple Z39.50 databases simultaneously and returning those results via a web service interface. ‡biblios uses Pazpar2 to perform searching of user-defined Z39.50 search targets. In the original design for searching from ‡biblios using PazPar2, a JavaScript library provided by IndexData was used to communicate with PazPar2 from the web browser. This script makes it possible for the web browser to send requests directly to the PazPar2 server, via a proxy server such as Apache. Because web browsers are forbidden from making AJAX requests to domains or ports other than their own, it is necessary to use a proxy between the browser and the PazPar2 server. Recently ‡biblios has moved to using a Perl proxy script and associated Perl module (paz.pl and PazPar2.pm) to route requests between ‡biblios and PazPar2. This has greatly simplified the JavaScript code in the browser, as it no longer has to deal with maintaining a sessions with PazPar2. By default, PazPar2 does not include full MARCXML records in the brief metadata it returns for each search result. For ‡biblios , the PazPar2 configuration files were modified so that ‡biblios receives the full record upon performing a search. This retrieval of the full record slightly slows down search performance, but it greatly speeds up previewing and saving records from the search results grid. This modification also allows users to save large batches of records from their search results into either their ‘save folders’ (Google Gears database) or to their computer. Use of CGI scripts ‡biblios uses several CGI scripts written in Perl to provide some functionality that is not easily implemented in the browser, or is better implemented on the server. Figure 5: CGI Scripts [View full-size image] The following CGI scripts are used in ‡biblios : downloadMarc.pl: downloads records from ‡biblios to user’s computer This script accepts a POST of MARCXML data. It then uses the Marc-Record suite of Perl modules [4] to convert from MARCXML to a user-determined record format that is returned to the browser. download.pl: generic handler for downloading temporarily saved records. Returns the name of a temporary file containing MARCXML records (produced by downloadMarc.pl or uploadMarc.pl) to the browser. exportdb.pl: exports of Google Gears’ database data to user’s computer ‡biblios sends serialized JSON to this script and the script instructs the browser to download a temporary file containing that serialized JSON. This file may then be saved to the user’s computer and reimported into ‡biblios. paz.pl: CGI proxy script for sending requests to PazPar2 search middleware This script acts as a proxy between ‡biblios and the PazPar2 server. It accepts requests from ‡biblios and routes them to a PazPar2 server. It restarts sessions with PazPar2 as required when running new searches. PazPar2.pm: a Perl module, originally developed by Galen Charlton for the Koha ILS and subsequently modified to respond to more PazPar2 request types. uploaddb.pl: script for uploading data to insert into Google Gears database. Return data uploaded by the user to ‡biblios for entering into Google Gears database. uploadMarc.pl: handles uploading of files of MARCXML or MARC21 records into ‡biblios. Accepts a file of MARCXML or MARC21 records, converts them to MARCXML and returns this data to ‡biblios for entering in Google Gears database. XSLTransform.pl: accepts a stylesheet and XML data to transform with that stylesheet. Uses the LibXML suite of Perl modules [5] to perform the XSLT transformation. ‡biblios uses this to generate the MARC21 editor and generate previews of MARCXML records. kohaws.pl: proxies web service requests from the ‡biblios koha plugin to an actual Koha installation. The Koha installation responds with an XML document and this script returns that document to ‡biblios for further processing or display. authoritiessruproxy.pl: proxies SRU queries to an SRU server. Used for querying authority records when editing an authority-controlled field in the marc editor. The choice of Perl as CGI scripting language for ‡biblios was pre-determined by its having started as a cataloging editor for the Perl-based Koha ILS. Perl also has robust support for MARC21 record handling. The CGI scripts require the following Perl modules: CGI LWP::UserAgent CGI::Carp MARC::Record MARC::Batch MARC::File::XML File::Temp File::Basename JSON CGI::Session Data::Dumper XML::LibXML XML::LibXSLT XML::Simple Communication with Integrated Library Systems ‡biblios has the ability to retrieve records from and save records to external Integrated Library Systems (ILS). As of this writing there exists a plugin for the Koha ILS [6]. The plugin queries Koha for the most recent version of a record found in the search results. The user is able to edit this record and then send it to Koha. Koha saves the record to its internal database and returns the record (with possible additions of item record tags) to ‡biblios for further editing. The plugin makes use of a simple web services API developed by Galen Charlton, Vice President of Development at LibLime, Inc. The API calls for the following methods, implemented in a RESTful way by the ILS: authenticate: authenticate the ‡biblios application to the Koha ILS bibprofile: retrieve a list of tags and subfields from Koha which must be present or which must have specified values (such as item location subfields) retrieve: given a biblionumber, retrieve the most recent version of this record from Koha’s internal database save: save a MARCXML record from the ‡biblios editor to Koha The API is fully documented on the ‡biblios website [7]. Generation of MARCXML Editor Since MARCXML is a simple XML format, ‡biblios generates an editor for MARCXML records using an XSLT stylesheet. The stylesheet generates input fields for each of the subfields, indicators and tag numbers in a record. It also generates a fixed fields editor for MARC21 records. This editor use an XML description of MARC21 fixed fields to gather data from the leader, 006, 007, and 008 fields and to generate HTML