The Code4Lib Journal – Using PHP to Parse eBook Resources from Drupal 6 to Populate a Mobile Web Page Mission Editorial Committee Process and Structure Code4Lib Issue 18, 2012-10-03 Using PHP to Parse eBook Resources from Drupal 6 to Populate a Mobile Web Page The Ursula C. Schwerin library needed to create a page for its mobile website devoted to subscribed eBooks. These resources, however, were only available through the main desktop website. These resources were organized using the Drupal 6 content management system with contributed and core modules. It was necessary to create a solution to retrieve the eBook databases from the Drupal installation to a separate mobile site. By Junior Tidal Introduction When launching our mobile website, the New York City College of Technology’s (CUNY) Ursula C. Schwerin Library needed to create a page devoted to our subscribed eBooks databases. These resources, however, were only available through our main website. It was then necessary to create a solution to retrieve the eBook databases from our Drupal-based desktop website and onto a separate mobile site. As a solution, we used the PHP Simple HTML DOM Parser. This free PHP framework allows administrators to parse, scrape, or extract, content from HTML pages. Drupal’s Core and Extended Modules The eBook resources are stored within the Drupal content management system (CMS). Using the extended Drupal modules, Content Creation Kit (CCK) and Views, in conjunction with the core Taxonomy module, allowed the development of a dynamically generated eBooks resource page. This type of e-resource management was selected because of its flexibility and ease of use with library website administrators. Updating these resources only requires filling out a few form fields within the browser, and the parameters of Drupal’s Views module automatically organize and sort these resources for end-users’ consumption. The technique to store this information stems from the work of Leo Klein. Klein has provided an online tutorial on the creation of a library database page using Drupal [1]. Using the CCK we created a Drupal 6 custom content type, aptly titled eBooks, to store information on our electronic resources. This includes the title of the resource, URL, description, and subjects pertaining to that database. The Taxonomy core module allows content types to be categorized with words and phrases, chosen by librarians. In this case, we assigned a category of “Type” for electronic resources, designating which ones were specifically eBooks. This taxonomy is used by the Views module to differentiate between resource types. The Views module is used to display the page of eBook resources. Views pulls the taxonomy types “electronic resources” and “eBooks” to render the page for end-users. The page displays the attributes of each resource in alphabetical order. A pull-down menu is also available to display resources for specific subject types. The subject categories are also populated from the Taxonomy core module. Figure 1. Screenshot of the Views configuration for our desktop A to Z database page. Figure 2. How end users see our databases A to Z page, rendered by Views. Mobile Website The library’s mobile website is based off of the jQuery Mobile framework (http://jquerymobile.com/). This framework was chosen because of its mass compatibility with a variety of mobile devices. At first, we developed a mobile site utilizing Drupal themes, mirroring the library’s desktop site. This was not an ideal situation. The Drupal theme that was used for mobile devices was overwhelming, providing all of the content of the desktop site in one huge menu. Additionally, this iteration of the site was not compatible with a number of mobile devices and phones. Some devices did not trigger the “switch” to alternate between the Drupal mobile theme and the desktop theme. As a result, we developed a mobile site that displays only selected information and links from the main desktop site. This includes the CUNY web OPAC, mobile friendly e-resources, contact information, and information such as library hours and directions. Since many vendors are creating mobile versions of their resources, we felt it necessary to have a mobile specific page to collect them all. The Pew Research Center noted that Overdrive, a distributor of eBooks for libraries, saw a mobile device usage increase of 22% in 2011 [2]. Noting this trend, we created a web page specifically for mobile electronic resources, eBooks, and iOS specific apps. In creating this, we found that a lot of information was being recreated on the mobile site. This could lead to potential problems, such as inconsistencies between the desktop and mobile sites, as well as the unnecessary maintenance of updating two separate sites with the same information. The mobile framework is independent and separate from the library’s Drupal site. The two platforms do not share the same database, making it difficult to share content. In order to populate the page containing eBook resources, we needed to pull that content from the desktop site. This would alleviate redundant maintenance and ensure a consistency between the two sites. This is where the PHP Simple HTML DOM Parser comes into play. Figure3. jQuery Mobile Framework Displaying eBook Resources. PHP Simple HTML DOM Parser The PHP Simple HTML DOM Parser , available at http://simplehtmldom.sourceforge.net/, is a utility which can be used to extract HTML contents and elements from a web page. Based on the parameters that are programmed into the PHP script, either HTML tags or the contents between tags may be extracted. Once this information is scraped, it can be arranged at the administrator’s discretion. The parse library, a component of the PHP Simple HTML DOM Parser, contains a number of examples on how to code a PHP parse page. Using these as an example, and looking at how the Views module creates HTML pages, it took some trial and error to program a page that would parse our Drupal site. The process works like this: The program opens the live HTML page. It then identifies the specific div and span tags that indicate which elements are eBooks. Once these eBooks are found, the program “scrapes” and stores specific attributes being used. This includes the title, URL, and description. The final step is to display this stored content in jQuery Mobile-friendly HTML output. Here is the source code of the scraper we wrote, using the PHP Simple HTML DOM Parser library: find('span[class=field-content] a') as $eBook) { if($i%2==0){ $item['title'] = $html->find('span[class=field-content] a', $i)->plaintext; $item['URL'] = $html->find('span[class=field-content] a', $j)->href; $eBooks[] = $item; } $j++; $i++; } $listBooks = '