A colleague (Christa Strickler) announced on a mailing list (ACQNET) the existence of a new issue of TCB (Technical Services in Religion and Theology). It was touted as an open access journal, and I wondered whether or not there was an application programmer interface (API) for downloading the content. After a bit of rooting around, I discovered that TCB is published using a system called Open Journal Systems (OJS), and OJS rigorously supports a protocol called OAI-PMH. So, to answer my question, "Yes, TCB does support an API."
I then wondered how easy it would be to actually harvest/cache/acquisition the content of TCB and then do some analysis against the result. Ironically, I had played with this exact same idea a few years ago. More specifically, I wrote a system of software to harvest the whole of another joural, Information Technology And Libraries. Looking through my archives, I found the desired code, updated it to point to TCB, and in less than two minutes I had downloaded the whole of TCB. The code to do this work is called OJS Toolbox Redux, and the resulting content ought to be here in the cache.
I then wanted to get my mind around the whole of TCB. What sorts of things were discussed? How have those things ebbed & flowed over time? To answer these questions, I used a thing called the Distant Reader Toolbox to create a data set from the cache and then do some analysis ("reading") against it. Here are a few of the rudimentary things I discovered:
The following word clouds illustrate the frequency of unigrams, bigrams, and statistically significant keywords found in the corpus. The content of the corpus lives up to its name, obviously.
unigrams |
bigrams |
keywords |
(If you want to take a gander at some additional characteristics of this data set, then check out the rudimentary index page.)
I then applied topic modeling to the corpus, and since the title has been in existence for twelve years, I topic modeled for twelve topics. This resulted in the following enumeration of themes, and the pie chart illustrates the dominance of the themes across the whole. Again the result echoes the name of the journal.
labels weights features library 0.07629 library course data cataloging digital metadat... rda 0.06975 rda atla cataloging library funnel naco conser... cataloging 0.06966 library cataloging quarterly classification se... terms 0.06435 rda terms library cataloging religion form atl... records 0.05667 records data record library cataloging use inf... class 0.03728 class topics individual theology cataloging li... heading 0.02412 heading music field headings add terms genre/f... cancel 0.02192 cancel church heading religious theology chris... india 0.02151 class india history literature information chu... collection 0.01955 collection openathens library oer resources ht... tcb 0.01453 library tcb maps information cataloging san ma... field 0.01361 heading field add literature former bible chan...
To illustrate how these themes ebbed & flowed over time, I augmented the underlying topic model with a year column, pivoted the model, and created the following stacked area chart. From the result we can see that the topic of "rda" was predominate between 2010 and 2012. We can see that "terms" had a going on just after that, but upon closer inspection, "terms" was still a lot about "rda". We can also see that the theme of "catalog" and "library" are pretty consistent throughout time.
You might ask, "Given this analysis, can you recommend some salient articles elaborating on the themes?" And my answer is, "Sure!" For example, a theme seems to be "rda". Searching the underlying data set's full text database, the following three articles are specifically about RDA and have RDA in the title:
An article specifically about TCB itself includes an editorial by Cynthia Snell (2021-08-23). From the computed summary:
Therefore, in order to appeal to a broader audience—including persons acquiring and cataloging materials at museums and archives—as well as to provide opportunities for interdisciplinary engagement with other library technical services professionals, we will roll out an expanded TCB beginning in 2022. TCB will remain a publication that focuses on the needs of technical services professionals, transforming from a publication for catalogers of materials in religion and theology to one that addresses the interests of all technical services staff who may be working with materials in religion and theology. –the Editors
They say that if you have a hammer, then everything begins to look like a nail. Well, my current hammer is the Distant Reader Toolbox, and I enjoy using the Toolbox to practice librarianship. With the Toolbox I create and curate collections. I then provide services against them. This missive outlined one of my explorations.
If you want to play with this collection, then begin by downloading the data set, and it is available at http://carrels.distantreader.org/curated-where_in_the_world_is_tcb-2024/index.zip. The compressed zip file is made up of mostly plain text files, a relational database, and a few images. You can then use the Toolbox or any number of other tools to do you own analysis. Other tools include: Wordle, OpenRefine, Antconc, any spreadsheet or database program, or even your text editor. Enjoy!
Eric Lease Morgan <emorgan@nd.edu>
Navari Family Center for Digital Scholarship
University of Notre Dame
Date created: October 28, 2022
Date updated: May 30, 2024