A special issue of Journal of eScience Librarianship was brought to my attention. The issue was on the topic of responsible AI in libraries and archives. I did a bit of distant reading against the issue, and outlined here are some of my take-aways.
There are nine articles in the issue for a total of 49,000 words. (See the rudimentary bibliography.) Thus, based on my experience, none of the articles are particularly long nor short. A rudimentary count & tabulation of unigrams, bigrams, and statistically significant keywords can be visualized, and from the results one can begin to get an idea of what is discussed in the articles:
unigrams |
bigrams |
keywords
For additional descriptive statistics-like analysis, see the generic index page.
Through the use of topic modeling, it is possible enumerate over-arching themes. After removing some stop words and modeling the corpus for nine topics (because there are nine articles), the following themes presented themselves and their distribution over the whole issue can be visualized:
topics weights features data 0.89933 data university project research ethical new information 0.23128 information research chatbots chatgpt provided science 0.18005 science recommendation system academic service records 0.14033 data records learning library community japan keenious 0.11057 keenious library tools questions libraries sentiment 0.10152 sentiment analysis beatles articles historical descriptors 0.07133 descriptors metadata fashion costume core term nhgri 0.05405 nhgri archive genome human project documents news 0.04590 news transcripts vtna television collection
When the underlying topic model is supplemented with author metadata values, the underlying model can be pivoted to address the question, "What authors discuss what topics?" From the results we can see that each author discusses something unique, but to some degree, each author discusses the theme of data-university-project-research. For example, Elings discusses data and records:
Modeling the special issue's articles in the form of network graphs is another way to garner what is discussed by whom and to what degree. For example, authors write articles and articles can be described with keywords. These things can be represented as nodes/edges combinations: authors --> articles --> keywords. Similarly, keywords have semantic relationships to other keywords, and those other keywords point to additional keywords: keywords --> keywords --> keywords. When it comes to the former, we can see how many authors discuss shared ideas (the words in the center of the graph), while they also speak to things unique to themselves. When it comes to the later, we can see how the contents of the issue are akin to a spectrum of ideas beginning with information and moving towards analysis.
The distant reading garnered a view of the special issue from 30,000 feet, so to speak. More importantly, it highlighted for me a number of words of interest. I call this my lexicon, and these words include: responsible, ethical, artificial intelligence, AI, data, information, science, records, keenious, sentiment, descriptors, nhgri, and news. I now ask myself, "What can I specifically learn about each of these lexicon words? What are they, and what is important about them?" In other words, I want to do some closer reading.
Using a similar sentence extraction technique I queried the corpus for things like "artificial intelligence is" and "ai is", but there were too few results. I broadened my query for sentences merely including the word "intelligence", and after perusing the results, I was not able to point to a definition of artificial intelligence nor AI. Bummer. Can you point me to definitions of such things?
Using the same technique again, I extracted a list of sentences containing the words "responsible" or "ethical", and I've listed some of the more interesting ones below:
Modal sentences -- sentences containing modal verbs like "should", "ought" , or "will" -- can be quite telling because such sentences are assertive. Using the lexicon as input, I output a list of modal sentences containing lexicon words. Listed here are some of the more interesting sentences:
I then applied a large-language indexing technique -- RAG (retrieval augmented generation) -- against the content -- AI on AI. Considering what I had learned so far, I queried the index. The results are more than plausible, and each result is associated with the one more more documents from whence the result was generated:
Ask anything: What is responsibility? Responsibility in this context entails collaborating with experts in behavioral sciences, artificial intelligence, and social work to define research consultation scenarios, lead case simulation construction and analysis, review and select GPT-based products, provide technical guidance on chatbot inquiries, and access supplemental ShareGPT data. It also involves ensuring access to necessary resources and tools, such as leading GPT-based products, without the need for special computing software or capabilities. References: 1. anderson-responsible-2024 / anderson (2024) (7) 2. feng-ethical-2024 / feng (2024) (4) 3. mannheimer-introduction-2024 / mannheimer (2024) (1) Ask anything: What is artificial intelligence or AI? Artificial intelligence, often referred to as AI, is a technology that can automate tasks previously done by humans. It has the capacity to analyze large amounts of data and generate new insights. However, AI implementation raises concerns about biases, reproducibility, and ethical considerations. While AI can improve research efficiency and accuracy, it is important to recognize its limitations and potential impacts on various aspects of work and society. References: 1. feng-ethical-2024 / feng (2024) (4) 2. mannheimer-introduction-2024 / mannheimer (2024) (3) 3. pastva-implementation-2024 / pastva (2024) (2) 4. anderson-responsible-2024 / anderson (2024) (1) 5. elings-using-2024 / elings (2024) (1) 6. mcirvin-automatic-2024 / mcirvin (2024) (1) Ask anything: What are libraries? Libraries are institutions that prioritize innovation and aim to create a 21st-century library that serves as a cornerstone of world-class research and scholarship. They seek out new tools and resources to enrich the scholarly information ecosystem, improve the resource discovery process, and point users to relevant research available in the library. Libraries also acknowledge that research begins outside of the library and aim to improve the research process while pointing back to library resources. Additionally, libraries engage in partnerships with faculty, educators, and service areas to enhance understanding of library service use and provide effective support to the community. References: 1. pastva-implementation-2024 / pastva (2024) (6) 2. beltran-open-2024 / beltran (2024) (2) 3. elings-using-2024 / elings (2024) (2) 4. feng-ethical-2024 / feng (2024) (2) Ask anything: What are the responsible and ethical issues surrounding the use of artificial intelligence in libraries? Privacy, consent, accuracy, labor considerations, the digital divide, bias, and transparency are the responsible and ethical issues surrounding the use of artificial intelligence in libraries as discussed in the provided context. It is essential to address these issues when incorporating AI tools and systems in library services to uphold ethical standards and ensure responsible technology use. References: 1. feng-ethical-2024 / feng (2024) (5) 2. pastva-implementation-2024 / pastva (2024) (3) 3. mannheimer-introduction-2024 / mannheimer (2024) (3) 4. anderson-responsible-2024 / anderson (2024) (1)
Through the use of text mining, natural language processing, and a few machine learning computing techniques I analyzed -- "read" -- a special issue of Journal of eScience Librarianship on the topic of responsible AI in libraries and archives. Based on my analysis the responsible and ethical use of AI in libraries surrounds privacy and bias. Moreover, there is a perception that artificial intelligence can be used effectively in libraries but not until the issues privacy and bias are addressed.
This analysis was done by first creating a Distant Reader data set -- affectionately called a "study carrel", and the data set as well as all of the modeling done against it is temporarily available as a zip file at the following URL:
http://carrels.distantreader.org/curated-jeslib_v13_n01-2024/index.zip
Eric Lease Morgan <emorgan@nd.edu>
Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame
March 13, 2024