Reading Journal of eScience Librarianship: Responsible AI in Libraries and Archives

A special issue of Journal of eScience Librarianship was brought to my attention. The issue was on the topic of responsible AI in libraries and archives. I did a bit of distant reading against the issue, and outlined here are some of my take-aways.

Basic characteristics

There are nine articles in the issue for a total of 49,000 words. (See the rudimentary bibliography.) Thus, based on my experience, none of the articles are particularly long nor short. A rudimentary count & tabulation of unigrams, bigrams, and statistically significant keywords can be visualized, and from the results one can begin to get an idea of what is discussed in the articles:


unigrams

bigrams


keywords

For additional descriptive statistics-like analysis, see the generic index page.

Topics/themes and their distribution

Through the use of topic modeling, it is possible enumerate over-arching themes. After removing some stop words and modeling the corpus for nine topics (because there are nine articles), the following themes presented themselves and their distribution over the whole issue can be visualized:

        topics  weights  features
          data  0.89933  data university project research ethical new
   information  0.23128  information research chatbots chatgpt provided
       science  0.18005  science recommendation system academic service
       records  0.14033  data records learning library community japan
      keenious  0.11057  keenious library tools questions libraries
     sentiment  0.10152  sentiment analysis beatles articles historical
   descriptors  0.07133  descriptors metadata fashion costume core term
         nhgri  0.05405  nhgri archive genome human project documents
          news  0.04590  news transcripts vtna television collection

When the underlying topic model is supplemented with author metadata values, the underlying model can be pivoted to address the question, "What authors discuss what topics?" From the results we can see that each author discusses something unique, but to some degree, each author discusses the theme of data-university-project-research. For example, Elings discusses data and records:

Network analysis

Modeling the special issue's articles in the form of network graphs is another way to garner what is discussed by whom and to what degree. For example, authors write articles and articles can be described with keywords. These things can be represented as nodes/edges combinations: authors --> articles --> keywords. Similarly, keywords have semantic relationships to other keywords, and those other keywords point to additional keywords: keywords --> keywords --> keywords. When it comes to the former, we can see how many authors discuss shared ideas (the words in the center of the graph), while they also speak to things unique to themselves. When it comes to the later, we can see how the contents of the issue are akin to a spectrum of ideas beginning with information and moving towards analysis.

Closer reading

The distant reading garnered a view of the special issue from 30,000 feet, so to speak. More importantly, it highlighted for me a number of words of interest. I call this my lexicon, and these words include: responsible, ethical, artificial intelligence, AI, data, information, science, records, keenious, sentiment, descriptors, nhgri, and news. I now ask myself, "What can I specifically learn about each of these lexicon words? What are they, and what is important about them?" In other words, I want to do some closer reading.

keenious and nhgri

What in the world are keenious and nhgri? At first glance, "keenious" appears to be some sort of adjective, and "nhgri" appears to be an OCR abomination. I first queried my full text index and learned that each of these keywords appear in one and only one document. Keenious only appears in pastva-implementation-2024 and nhgri only appears in hosseini-ethical-2024. Through the use of sentence extraction techniques, the following definitions of keenious and nhgri came to light:

What are artificial intelligence and AI?

Using a similar sentence extraction technique I queried the corpus for things like "artificial intelligence is" and "ai is", but there were too few results. I broadened my query for sentences merely including the word "intelligence", and after perusing the results, I was not able to point to a definition of artificial intelligence nor AI. Bummer. Can you point me to definitions of such things?

Responsible and ethical

Using the same technique again, I extracted a list of sentences containing the words "responsible" or "ethical", and I've listed some of the more interesting ones below:

Modals

Modal sentences -- sentences containing modal verbs like "should", "ought" , or "will" -- can be quite telling because such sentences are assertive. Using the lexicon as input, I output a list of modal sentences containing lexicon words. Listed here are some of the more interesting sentences:

AI on AI

I then applied a large-language indexing technique -- RAG (retrieval augmented generation) -- against the content -- AI on AI. Considering what I had learned so far, I queried the index. The results are more than plausible, and each result is associated with the one more more documents from whence the result was generated:

Ask anything: What is responsibility?

Responsibility in this context entails collaborating with experts in
behavioral sciences, artificial intelligence, and social work to define
research consultation scenarios, lead case simulation construction and
analysis, review and select GPT-based products, provide technical
guidance on chatbot inquiries, and access supplemental ShareGPT data. It
also involves ensuring access to necessary resources and tools, such as
leading GPT-based products, without the need for special computing
software or capabilities.

References: 
  1. anderson-responsible-2024 / anderson (2024) (7)
  2. feng-ethical-2024 / feng (2024) (4)
  3. mannheimer-introduction-2024 / mannheimer (2024) (1)


Ask anything: What is artificial intelligence or AI?

Artificial intelligence, often referred to as AI, is a technology that
can automate tasks previously done by humans. It has the capacity to
analyze large amounts of data and generate new insights. However, AI
implementation raises concerns about biases, reproducibility, and
ethical considerations. While AI can improve research efficiency and
accuracy, it is important to recognize its limitations and potential
impacts on various aspects of work and society.

References: 
  1. feng-ethical-2024 / feng (2024) (4)
  2. mannheimer-introduction-2024 / mannheimer (2024) (3)
  3. pastva-implementation-2024 / pastva (2024) (2)
  4. anderson-responsible-2024 / anderson (2024) (1)
  5. elings-using-2024 / elings (2024) (1)
  6. mcirvin-automatic-2024 / mcirvin (2024) (1)


Ask anything: What are libraries?

Libraries are institutions that prioritize innovation and aim to create
a 21st-century library that serves as a cornerstone of world-class
research and scholarship. They seek out new tools and resources to
enrich the scholarly information ecosystem, improve the resource
discovery process, and point users to relevant research available in the
library. Libraries also acknowledge that research begins outside of the
library and aim to improve the research process while pointing back to
library resources. Additionally, libraries engage in partnerships with
faculty, educators, and service areas to enhance understanding of
library service use and provide effective support to the community.

References: 
  1. pastva-implementation-2024 / pastva (2024) (6)
  2. beltran-open-2024 / beltran (2024) (2)
  3. elings-using-2024 / elings (2024) (2)
  4. feng-ethical-2024 / feng (2024) (2)


Ask anything: What are the responsible and ethical issues surrounding
the use of artificial intelligence in libraries?

Privacy, consent, accuracy, labor considerations, the digital divide,
bias, and transparency are the responsible and ethical issues
surrounding the use of artificial intelligence in libraries as discussed
in the provided context. It is essential to address these issues when
incorporating AI tools and systems in library services to uphold ethical
standards and ensure responsible technology use.

References: 
  1. feng-ethical-2024 / feng (2024) (5)
  2. pastva-implementation-2024 / pastva (2024) (3)
  3. mannheimer-introduction-2024 / mannheimer (2024) (3)
  4. anderson-responsible-2024 / anderson (2024) (1)

Summary

Through the use of text mining, natural language processing, and a few machine learning computing techniques I analyzed -- "read" -- a special issue of Journal of eScience Librarianship on the topic of responsible AI in libraries and archives. Based on my analysis the responsible and ethical use of AI in libraries surrounds privacy and bias. Moreover, there is a perception that artificial intelligence can be used effectively in libraries but not until the issues privacy and bias are addressed.

Epilogue

This analysis was done by first creating a Distant Reader data set -- affectionately called a "study carrel", and the data set as well as all of the modeling done against it is temporarily available as a zip file at the following URL:

http://carrels.distantreader.org/curated-jeslib_v13_n01-2024/index.zip


Eric Lease Morgan <emorgan@nd.edu>
Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame

March 13, 2024