The Code4Lib Journal – Diva.js: A Continuous Document Viewing Interface Mission Editorial Committee Process and Structure Code4Lib Issue 14, 2011-07-25 Diva.js: A Continuous Document Viewing Interface Diva.js is a multi-page browser-based document viewer designed to present high-resolution digitized document images as a continuous, scrollable item. This article examines the current state of the art in online document display technologies, and presents a list of functional requirements the authors used to guide the creation of this new online document viewer. The authors then discuss the image processing infrastructure necessary for deploying the Diva.js viewer, and present a brief discussion of how the viewer is currently deployed in their organization. by Andrew Hankinson, Wendy Liu, Laurent Pugin, and Ichiro Fujinaga Introduction Diva (Document Image Viewer with Ajax) is a multi-page document image viewer, designed for digital libraries to present documents as a single, continuous item, rather than the traditional method of presenting page images one at a time. Using “lazy loading” methods for only loading those parts of a document that are currently being viewed, it presents a quick and efficient way of displaying hundreds of page images from digitized books and other documents on a single web page. The Diva project centers around Diva.js, a unique jQuery plugin that provides a user interface for viewing document images. The design of this component was informed by a survey of existing document image viewing interfaces, and from this survey a number of functional requirements for how document images should be displayed in a browser were formulated. While the core of the Diva project is the jQuery plugin, other image processing infrastructure is required to serve the images in an efficient manner. We discuss how the components of this infrastructure work together in the “How it Works” section. As an open-source project, we welcome feedback and code contributions. We are currently using Diva as part of an ongoing research project in digital music documents, but are interested in how it can be used for other document digitization projects in libraries and archives. History and Motivation Diva began as part of a project by the Swiss working group for the Répertoire International des Sources Musicale (RISM) project [1]. RISM is an international initiative, founded in 1952, whose purpose is to identify and catalogue musical sources held in libraries and archives around the world. The RISM database contains extensive information about each source in its catalogue, and is an important tool for discovery and source verification for music researchers. In 2008, the Swiss working group for the RISM project was granted funding for an exploratory project in digitizing musical sources (prints and manuscripts) from Swiss composers held in libraries and monasteries across Switzerland. The goal of this project was to combine the extensive, research-quality metadata gathered over decades with high-quality images of the source itself, and publish these online in a freely accessible database. This would allow researchers to access the sources globally, and provides an important component for the preservation and promotion of Swiss national musical heritage [2]. At the beginning of this exploratory project, a partnership between the Swiss RISM and the Distributed Digital Music Libraries Laboratory (DDMAL) at the Schulich School of Music of McGill University was tasked with identifying the tools and interfaces that would be used for displaying the source images in a web browser. To identify the current “state of the art” we conducted a survey of 24 digital libraries. Based on this survey, we identified a list of functional requirements that we wanted in a document image viewer that was specifically designed for displaying musical sources. This survey is discussed extensively by Hankinson, Pugin, and Fuginaga (2009) [3], but we present a brief summary here. Survey We formulated five functional requirements from our survey in response to what we considered to be the best and worst features of current digital document image displays. While they were developed in a musical context, we think that they are broadly generalizable over all document types. 1. Preserve Document Integrity One of the most common methods for displaying document images online is in an “image gallery” format, where users are required to click “next” and “back” hyperlinks to navigate page by page through the document. Some interfaces provide users with the ability to navigate to a specific page, either using a drop-down selection, a text box, or a series of hyperlinked page numbers. This method of display does little to preserve the original document as a cohesive entity. There are several alternatives to this approach. The Google Books project [4], perhaps the largest and most well-known book digitization project, presents items as a single scrollable entity embedded within the webpage. This allows users to very quickly scroll and navigate through the entire work. The Internet Archive [5] method uses a book metaphor where users are presented with images of facing pages and can navigate through the book by “turning” the pages. This method presents some usability challenges for quickly navigating through a source, but it does provide a very accurate representation of the original source layout. 2. Allow side-by-side comparison of items Music, especially in older sources, is often published as “parts,” where each musician’s score contains only the music for their instrument or voice. These parts are bound in separate physical books or leaflets. As such, reconstructing the entire musical piece requires viewing multiple books simultaneously. Providing the ability to view two or more sources simultaneously, therefore, is important for comparing sources or co-reading a source along with its criticism or annotations. 3. Provide multiple page resolutions For older materials, especially manuscripts, the ability to view details such as faint pencil markings or different ink colours is necessary. High-resolution images allow the user to identify small details, which is especially important in tasks such as reading cursive script or identifying faded or obscured portions of the image. On the other hand, smaller low-resolution images give users the ability to very quickly get an overview of the entire document. The ability to move between high- and low-resolution images is a feature of many existing digital library systems, with many choosing to display three image resolutions: “thumbnail” for overviews of the entire document, “browser safe” for displaying individual pages that fit on a users’ screen, and a “high resolution” view for detailed image study. The state of the art for this type of document viewing can be seen in systems like the World Digital Library [6] [Figure 1], where the user selects from a list of thumbnails provided in the left column to zoom in to see details in the full-page display. Figure 1. World Digital Library. 4. Optimize document loading Presenting large page images, especially at higher resolutions, makes it difficult for users to quickly leaf through a document. Two common approaches are to allow users to download the high-resolution images one at a time, or have them download the entire document encapsulated within a container format like PDF. For ultra-high resolution documents, however, this download may involve hundreds of megabytes, and the user must wait until the entire document is downloaded before viewing it. The Google Books and Internet Archives interfaces use optimization techniques that load page images only on demand. As a user scrolls through the document, the browser sends an asynchronous HTTP request (“Ajax”) to the server to deliver the images for the pages that the user is currently viewing. This can be further optimized for images at very high resolutions by borrowing a technique from mapping sites where high-resolution satellite photos are broken into smaller tiles, which may then be served to the user in parallel. 5. Present item and metadata simultaneously While not part of the document viewer itself, the ability to view the item and its catalogue metadata simultaneously is a highly desirable feature. This requirement takes on a more significant role when integrating the viewer into a larger context, and points to the need for a document viewing system that can be easily integrated into existing digital library environments, including an integrated set of controls or callbacks for manipulating page zooming and scrolling. Existing implementations Although we compiled our list of functional requirements by examining features from existing implementations, we could find no single existing implementation where all features were present. As part of our survey, we examined some standalone document viewing frameworks, including Zoomify [7], the IIPMooViewer (and variants) [8], and the Seadragon Ajax viewer [9]. We determined that these viewers were designed as generic high-resolution image viewers, and as such did not present a document as a complete cohesive entity. Google Books has a promising document image viewer, but it is currently not available as an open-source project. At the time of our survey, the Internet Archive used the DjVu format and a page-turning book interface that ran in the browser as a Java applet. They have since developed a new BookReader [10] interface, currently the only other open-source implementation known to encapsulate many of our functional requirements. Their interface offers many interesting features, such as contact sheet and facing-page views. However, their implementation does not use a tiling system for downloading page content, meaning that at higher resolutions the entire page is downloaded to the browser as a single image. Towards Diva.js Our first implementation was written as a component to a larger ExtJS interface for managing digital images, part of an in-house image system for the Swiss RISM. It was not released publicly, but it has been adopted by a number of other projects related to the RISM. While the ExtJS component worked very well, it required loading the entire ExtJS library, including superfluous components, which added significantly to the loading time for pages. The decision was made to port the existing system to jQuery, a widely used JavaScript framework that is smaller and more modular in design, with faster load and execution time than ExtJS. Development on the jQuery plugin began in February 2011, using the same design requirements as the original ExtJS component. In the next section we describe the components of a complete Diva.js installation, including image processing, server infrastructure, and the features of the jQuery plugin. How It Works The most visible and unique component of this project is the JavaScript-based document viewer itself, but this must be backed by efficient server software to process and deliver multi-resolution images to the user. A brief overview of how these components fit together is given in Figure 2. Figure 2. The components of the Diva.js infrastructure. Image Processing To switch between multiple sizes of page images, Diva uses a multi-resolution, or “pyramid” image format [Figure 3], where the original high-resolution image files are pre-processed into several sizes of smaller resolution images that are themselves sub-divided into an array of tiles of a given size. We have found tiles of 256×256 pixels work best. These images can be created by several open-source image-processing utilities. We have found that the VIPS [11] image processor is the most efficient and provides the most reliable results, but ImageMagick [12] could also be used. Figure 3. Multi-resolution, or “pyramid” image structure. There are two popular image formats that provide multi-resolution image capabilities, TIFF and JPEG2000. While JPEG2000 is more modern and arguably provides more fine-grained control over the number of image resolutions that can be served, there are no open-source libraries for JPEG2000 image processing and serving. The Kakadu [13] library, currently the most efficient and best supported JPEG2000 library, requires a licensing fee for implementers. For version 1.0 of the Diva project we chose to support only multi-resolution TIFF images, but JPEG2000 support is scheduled as a desired feature for the next version. In the multi-resolution TIFF format, the number of image resolutions available for a given image is determined by the maximum resolution of the original file and the size of the individual tiles. The formula for calculating the number of resolutions (n) available for a given image size is given in Equation 1. Equation 1. Calculating the number of resolutions based on image and tile sizes. The largest dimension, D, of the image (width or height) and the tile size, Ts, are both given as pixels. For example, an image with the largest dimension of 4120 pixels and a tile size of 256 pixels would be processed into six zoom levels. A Python script, process.py, is available in the Diva repository that will process a directory of images into multi-resolution TIFF files using the VIPS Python module. Tile Image Server The tile image server takes a request for a segment of the image at a given resolution and returns that portion of the image as a tile. It runs as a FastCGI process and can be integrated into most popular web servers. An example request to the tile server might look like: http://example.com/iipsrv.fcgi?FIF=/tmp/image-75.tif&JTL=2,0 The JTL parameter in the request specifies (resolution,tile) which, in this case, is the first tile from the third resolution (resolution numbers are zero-indexed) for image-75.tif. Although there are several potential image tile servers, we currently only support the IIP Image Server [14]. While the IIP Image Server presents a very fast method of processing and serving image regions, we found that a single FastCGI process could not provide an adequate response time when dealing with the highest resolution images. Our current setup uses the Nginx web server as a front-end and load balancer, providing round-robin balancing across five FastCGI processes. The IIP Image Server also has the ability to utilize a Memcached database for caching tiles. Our current implementation uses five Memcached stores. We have found that this setup provides acceptable performance, even on slower connections. Data Server The data server is responsible for sending information about the entire document to the browser component. This information is used by the browser to construct the container HTML elements required to display the entire document and its constituent pages. This data server has access to a number of stored properties about the document, and can then construct a JSON response to send to the browser containing information about the total document height at a given zoom level (i.e., the total height of all images in the collection), the average and maximum heights and widths of the images (for the inter-page margins and ensuring the document is centred in the browser) and the dimensions of each page at a given zoom level. The data server may be implemented in any server-side language. We are currently using a Django web application with a SQLite database for storing image information. However, we are also bundling a basic PHP script, divaserve.php, for providing filesystem-based image serving. This PHP script caches the required information for each image as a plain text file, which is then read and served when the user requests a document. This has the advantages of running on any PHP-equipped web server, and not requiring a complete database setup for serving small collections of image files. jQuery Plugin The Diva.js jQuery plugin consists of a small number of JavaScript, CSS, and image files used for displaying the images in the browser. The core jQuery and the jQuery UI libraries are both prerequisites for the plugin, and we bundle the Dragscrollable [15] plugin with Diva.js. The following code block shows a typical instantiation of the Diva.js component. ...