The Code4Lib Journal – Using Google Tag Manager and Google Analytics to track DSpace metadata fields as custom dimensions
Mission
Editorial Committee
Process and Structure
Code4Lib
Issue 27, 2015-01-21
Using Google Tag Manager and Google Analytics to track DSpace metadata fields as custom dimensions
DSpace can be problematic for those interested in tracking download and pageview statistics granularly. Some libraries have implemented code to track events on websites and some have experimented with using Google Tag Manager to automate event tagging in DSpace. While these approaches make it possible to track download statistics, granular details such as authors, content types, titles, advisors, and other fields for which metadata exist are generally not tracked in DSpace or Google Analytics without coding. Moreover, it can be time consuming to track and assess pageview data and relate that data back to particular metadata fields. This article will detail the learning process of incorporating custom dimensions for tracking these detailed fields including trial and error attempts to use the data import function manually in Google Analytics, to automate the data import using Google APIs, and finally to automate the collection of dimension data in Google Tag Manager by mimicking SEO practices for capturing meta tags. This specific case study refers to using Google Tag Manager and Google Analytics with DSpace; however, this method may also be applied to other types of websites or systems.
by Suzanna Conrad, Digital Initiatives Librarian and Head of Digital Services & Technology, Cal Poly Pomona
Introduction
Cal Poly Pomona launched an institutional repository, dubbed “Bronco Scholar,” in February 2014 to support various types of scholarship from the campus community. As one of twenty-three California State University (CSU) campuses, Cal Poly Pomona was able to use a multi-tenant, shared instance of the open-source repository software DSpace hosted by the CSU Chancellor’s Office. While we are provided with a number of helpful statistics about bitstream downloads, searches, and other statistics within this multi-tenant instance, accessing granular data about specific downloads is a very time-consuming and manual process. It was important to us to be able to track specific details about individual downloads and pageviews including author names, advisor names for student research projects, content types, and titles of items accessed. And, because our DSpace instance is part of a multi-tenant instance and we do not have full administrator privileges to edit code, we needed a solution that would not require much custom code.
This article will detail the learning process of incorporating custom dimensions for tracking these detailed fields including trial and error attempts to use the data import function manually in Google Analytics, our attempts to automate the data import using Google APIs, and finally how we successfully automated collection of dimension data in Google Tag Manager by mimicking SEO practices for capturing meta tags. This specific case study demonstrates using Google Tag Manager and Google Analytics with DSpace; however, this method may also be applied to other types of websites or systems.
Google Analytics & events tracking
Google Analytics is often implemented by libraries of all types to track website usage. A simple snippet of JavaScript code on a website linked to a Google Analytics account enables Google to intercept page requests and capture standard data that is sent by a user agent along with the request — such as information about the requested page and the user agent that’s making the request. With vanilla Google Analytics, website owners can view a standard set of metrics (such as the number of pageviews or unique pageviews) grouped by a standard set of dimensions (such as page URL, page title, user location, etc.).
Using custom JavaScript, one can capture additional data about page activity by using event tracking. With event tracking, one can set up JavaScript event handlers to submit data to Google Analytics every time a particular event is triggered (such as when a page element is clicked or a key is pressed). For instance, if one specifically wanted to track downloads of PDFs on a website, one could set a particular Google Analytics event to be recorded each time a relevant download link is clicked. Other authors have discussed technical implementations and benefits of event tracking using JavaScript edits and custom variables.[1] Once event tracking is implemented in Google Analytics, it is possible, for example, to track downloads by file types, file names, and other dimensions. Furthermore, it is possible to set up goals to better track conversions, such as tracking the percentage of users visiting the site who actually download repository content.
Google Analytics custom dimensions & metrics
If website owners want to track certain metrics or group data based on dimensions that are not included in default Google Analytics, they can set up custom metrics or dimensions as discussed in the Google Analytics Platform Principles Course. [2] Within the standard Google Analytics reporting tools it is possible to use custom dimensions as secondary dimensions when viewing reports. In our case, we were specifically interested in grouping data based on a key set of metadata tracked in DSpace: content type, author name, advisor name, and title. Adding these metadata fields as custom dimensions was a logical way to accomplish this goal. [3]
Google Tag Manager
Google Tag Manager is a product that provides a simplified means of updating website “tags.” By Google’s definition, “a tag is a snippet of JavaScript that sends information to a third party,” such as a piece of custom Google Analytics event tracking code. [4] Coates and Durrant discussed the usage of Google Tag Manager to track events specifically within DSpace. [5] However, these principles can apply to all types of websites that need to track events. As long as a code snippet for Google Tag Manager can be implemented on the website, it is possible to automate tag creation without editing the code on each and every page.
For example, many event tracking implementations include sample code such as:
Download PDF
In Google Tag Manager this function can be handled by a tag. The tag fires on a page according to the rules established for that tag.
In order for a Google Tag Manager tag to work, it relies on rules and macros. A tag in Google Tag Manager is implemented for whatever is being measured such as pageviews or downloads. A rule defines in what instance the tag should be fired. A macro, whether pre-defined or custom, is a name-value pair that the tag references to define values in the tag.
Tracking events in Google Tag Manager
The first goal of our project was to track downloads of DSpace bitstreams within Google Analytics so that we could filter out administrator traffic and glean more information about the users of our repository. Specifically, we wanted to utilize built-in demographic data about our audiences and also track and analyze their behavior in a more granular fashion than is possible in DSpace. Normally we would not have the ability to maintain a set of custom events within the code of our consortial DSpace instance, but Google Tag Manager requires only the implementation of one code snippet and was easy for our administrator to implement across our entire DSpace instance. Referencing Coates and Durrant’s solution and Rachel Sweeney’s blog on setting up download tracking in Google Tag Manager, [6] we set up three tags in Google Tag Manager: a link click listener to listen for clicks on bitstream files, a total downloads tag, and an outbound links tag to listen for clicks to external websites. Two rules supported these tags: a rule that fired the tag on all pages and a rule that fired the download tag when a bitstream was clicked on. Macros were also configured based on Sweeney’s recommendations to strip out any information other than the file type and to return the file format. Using these recommendations, our tags were configured as follows:
Figure 1: Google Tag Manager configuration at CPP
Troubleshooting custom dimensions in Google Analytics
We still had the larger problem of not being able to use specific metadata fields as dimensions in our analytics reports. We wanted to be able to provide authors with reports of their downloads or to group traffic by certain types of content such as student research. Custom dimensions were a good solution to track these additional metadata fields.
As outlined in a blog post by Justin Cutroni, it is possible to use the data import function to “widen dimensions” with a CSV file. [7] Essentially, adding custom dimensions from an external file would allow one to map external data (not available in the standard Google dimensions) to existing Google Analytics data structures using a unique key. Since the Google Academy and bloggers have published a number of step-by-step guides for importing custom dimensions using the data import function, this was a starting point for attempting to implement custom dimensions. To test this data import function, we first set up four custom dimensions in Google Analytics using custom definitions in the Property Admin section:
Figure 2: Custom dimensions admin setup in Google Analytics
Then we set up these dimensions in the data import section (also located under Property Admin in Google Analytics), which provided us with the keys to map each of the metadata fields to a dimension, e.g., ga:dimension3 maps to the “advisor” custom dimension.
Figure 3: Mapped metadata fields in Google Analytics data import
DSpace does offer a metadata export function, which we used to get a metadata dump for the CSV file. This file was unfortunately very messy; often the metadata fields would span two columns so we had to combine those fields manually. Also quotations within titles were corrupted during the export and some fields had to be normalized before they could be imported successfully to Google Analytics. We were only interested in tracking four metadata fields in custom dimensions, so we also had to reduce the CSV to include only those four columns and the unique key that Google Analytics would reference to assign dimensions to pages. In our case, the unique key was the pagePath, which corresponded to the handle URL in DSpace. Our final CSV included columns for the pagePath and four columns for the custom dimensions as defined in Google Analytics.
Figure 4: Sample CSV file for Google Analytics data import
Once we had cleaned up the CSV file and set the paths and dimensions to correspond to the custom dimensions in Google Analytics, we were able to upload the CSV under the “Data Import” section under Property Admin. This process was successful and we were able to see the custom dimensions in our reports.
Figure 5: Custom dimensions displaying in Google Analytics (data import)
There were a number of disadvantages to this method. Dimension widening is not retroactive, so the dimensions applied only to downloads and pageviews occurring after the data import had happened. The manual process of cleaning the CSV was also extremely time-consuming, and a more automated process would ensure that all downloads and pageviews could be captured. Within the Platform Principles course, the Google Analytics Academy also mentions APIs for dimension widening, so we began to investigate how we could automate this process using the APIs instead.
We tested scraping the site and dumping the correct metadata fields into a CSV that could be pulled into Google Analytics using the APIs. We were able to get a cleaner data pull of the metadata fields than with the DSpace metadata export. However, two problems still existed. First, a process still had to run to pull the data into Google Analytics using the APIs, and second, the dimensions would only be recorded as of the time that process ran, which meant that usage statistics of new uploads would not be tracked immediately after upload.
Since Google Tag Manager was already firing on our DSpace instance pages, we began investigating the possibility of using existing tags to track metadata fields and include those fields as custom dimensions. One SEO blogger discussed the possibility of tracking meta tag values using Google Tag Manager to determine what optimized keywords were performing best on individual webpages. [8] DSpace includes metadata fields as meta tags in the page source, so this solution offered automated tracking of DSpace item-level metadata using custom dimensions.
Our solution using Google Tag Manager
First, as with our earlier “data widening” solution, in Google Analytics we added a new custom dimension each for content type, author, advisor, and title. [9]
Figure 6: Additional custom dimensions in Google Analytics
We set the scope to “hit.” Other options include user, session, and product. The first two scopes only record the first item that a user clicks on or downloads, and if the user subsequently downloads content while on the website, those downloads will inherit the first item’s metadata records. The scope “product” is an e-commerce specific scope that applies to a product for purchase. The index number was noted so that it could be mapped in Google Tag Manager.
In Google Tag Manager, we had already configured tags and rules to collect download statistics per examples above (see the section, Tracking events in Google Tag Manager). We set up two additional types of macros including a data layer macro to hold the information for that dimension and a macro for tracking the meta tag element we wanted to track. The second macro consisted of custom JavaScript that collected the information from the page. For example, this is the macro we set up to pull author information:
function getData() {
var x = document.getElementsByTagName("META");
var txt = [];
for (var i=0;i