introduction to the special issue: responsible ai in libraries and archives journal of escience librarianship 13 (1): e860 doi: https://doi.org/10.7191/jeslib.860 issn 2161-3974 editorial introduction to the special issue: responsible ai in libraries and archives sara mannheimer, montana state university, bozeman, mt, usa, sara.mannheimer@montana.edu doralyn rossmann, montana state university, bozeman, mt, usa jason clark, montana state university, bozeman, mt, usa yasmeen shorish, james madison university, harrisonburg, va, usa natalie bond, university of montana, missoula, mt, usa hannah scates kettler, iowa state university, ames, ia, usa bonnie sheehey, montana state university, bozeman, mt, usa scott w. h. young, montana state university, bozeman, mt, usa focus librarians and archivists are often early adopters and experimenters with new technologies. our field is also interested in critically engaging with technology, and we are well-positioned to be leaders in the slow and careful consideration of new technologies. therefore, as librarians and archivists have begun using artificial intelligence (ai) to enhance library services, we also aim to interrogate the ethical issues that arise while using ai to enhance collection description and discovery and streamline reference services and teaching. the imls-funded responsible ai in libraries and archives project aims to create resources that will help practitioners make ethical decisions when implementing ai in their work. the case studies in this special issue are one such resource. seven overarching ethical issues come to light in these case studies—privacy, consent, accuracy, labor considerations, the digital divide, bias, and transparency. this introduction reviews each issue and describes strategies suggested by case study authors to reduce harms and mitigate these issues. received: december 14, 2023 accepted: february 5, 2024 published: march 6, 2024 keywords: responsible ai, artificial intelligence, privacy, consent, accuracy, labor, digital divide, bias, transparency citation: mannheimer, sara, doralyn rossmann, jason clark, yasmeen shorish, natalie bond, hannah scates kettler, bonnie sheehey, and scott w. h. young. 2024. “introduction to the responsible ai special issue.” journal of escience librarianship 13 (1): e860. https://doi.org/10.7191/jeslib.860. the journal of escience librarianship is a peer-reviewed open access journal. © 2024 the author(s). this is an open-access article distributed under the terms of the creative commons attribution 4.0 international license (cc-by 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. see http://creativecommons.org/licenses/by/4.0. open access https://doi.org/10.7191/jeslib.860 mailto:sara.mannheimer%40montana.edu?subject= https://www.lib.montana.edu/responsible-ai/ https://doi.org/10.7191/jeslib.860 http://creativecommons.org/licenses/by/4.0 https://orcid.org/0000-0002-1433-6782 https://orcid.org/0000-0002-6490-4223 https://orcid.org/0000-0002-3588-6257 https://orcid.org/0000-0002-4155-8241 https://orcid.org/0000-0001-6850-2021 https://orcid.org/0000-0001-7706-713x https://orcid.org/0000-0001-9088-1304 https://orcid.org/0000-0002-3082-4057 journal of escience librarianship 13 (1): e860 | https://doi.org/10.7191/jeslib.860 e860/2 introduction librarians and archivists are often early adopters and experimenters with new technologies. our field is also interested in critically engaging with technology, and we are well-positioned to be leaders in the slow and careful consideration of new technologies. therefore, as librarians and archivists begin using artificial intelligence (ai) to enhance library services, we also aim to interrogate the ethical issues that arise. the imls-funded responsible ai in libraries and archives project aims to create resources that will help practitioners make ethical decisions when implementing ai in their work. the case studies in this special issue are one such resource. the eight responsible ai case studies included here show the variety of ways in which librarians and archivists are currently using ai in their practice, with a special focus on the ethical issues and considerations that arise over the course of implementing ai tools and systems. the case studies include examples of using recommender systems—both library built systems (“open science recommendation systems for academic libraries”) and vendor systems (“the implementation of keenious at carnegie mellon university”), an experiment with chatgpt (“ethical considerations in integrating ai in research consultations: assessing the possibilities and limits of gpt-based chatbots”), a computational investigation of the human genome project archives (“ethical considerations in utilizing artificial intelligence for analyzing the nhgri’s history of genomics and human genome project archives”), sentiment analysis of news articles about the beatles (“‘i’ve got a feeling’: performing sentiment analysis on critical moments in beatles history”), using natural language processing to generate richer description for historical costuming artifacts (“automatic expansion of metadata standards for historic costume collections”), using automated speech recognition and computer vision to create transcripts and metadata for a television news archive (“responsible ai at the vanderbilt television news archive: a case study”), and partnering with an ai company to extract metadata from historical images (“using ai/machine learning to extract data from japanese american confinement records”). seven overarching ethical issues come to light in these case studies—privacy, consent, accuracy, labor considerations, the digital divide, bias, and transparency. we review these issues further below, including strategies suggested by case study authors to reduce harms and mitigate these issues. privacy most of the case studies in this issue consider privacy in their ai project implementation. beltran, griego, and herckis, in their discussion of a library-built open science recommendation system, “open science recommendation systems for academic libraries,” suggest that ongoing development of rules, policies, and norms can support privacy. for vendor tools, pastva et al. describe working with a vendor to ensure that the vendor’s privacy policy and terms of service aligned with library and archives values and practices in “the implementation of keenious at carnegie mellon university.” other case studies discuss how digitization and larger-scale availability of archival records can lead to complexities related to privacy. wolff, mainzer, https://doi.org/10.7191/jeslib.860 https://www.lib.montana.edu/responsible-ai/ https://doi.org/10.7191/jeslib.804 https://doi.org/10.7191/jeslib.804 https://doi.org/10.7191/jeslib.800 https://doi.org/10.7191/jeslib.846 https://doi.org/10.7191/jeslib.846 https://doi.org/10.7191/jeslib.811 https://doi.org/10.7191/jeslib.811 https://doi.org/10.7191/jeslib.849 https://doi.org/10.7191/jeslib.845 https://doi.org/10.7191/jeslib.845 https://doi.org/10.7191/jeslib.805 https://doi.org/10.7191/jeslib.805 https://doi.org/10.7191/jeslib.850 https://doi.org/10.7191/jeslib.850 https://doi.org/10.7191/jeslib.804 https://doi.org/10.7191/jeslib.804 https://doi.org/10.7191/jeslib.800 https://doi.org/10.7191/jeslib.800 e860/3 journal of escience librarianship 13 (1): e860 | https://doi.org/10.7191/jeslib.860 and drummond’s case study, “‘i’ve got a feeling’: performing sentiment analysis on critical moments in beatles history,” analyzes historical news articles about the beatles—high profile celebrities. by focusing on public figures, wolff et al. reduce privacy concerns for the project at hand, but they suggest that privacy is still relevant, writing, “were the same analysis methods applied to non-public figures, privacy considerations such as the “right to be forgotten,” or excluded from computational analysis of available text data, would be required.” for case study authors working with more sensitive records, data security and restricted access are key considerations. elings, friedman, and singh, whose case study “using ai/machine learning to extract data from japanese american confinement records” focuses on extracting metadata from images in japanese internment records, describe building, testing, and implementing a sustainable model for integrating community input from stakeholders and people represented in the collection. elings et al. also discuss implementing access restrictions for the data. hosseini et al., whose case study works with genomic records, “ethical considerations in utilizing artificial intelligence for analyzing the nhgri’s history of genomics and human genome project archives,” describe reducing the number of records made available in order to “mitigate risks, ensure ethical compliance, and maintain data privacy standards while enabling valuable research outcomes.” such tradeoffs factor into responsible implementation of ai tools and projects. consent use of data without explicit consent is of concern to the authors in this special issue. additional challenges arise when the source data was gathered prior to the existence of ai tools. hossieni et al. experience this ethical tension in their biometrics study of a national archive in “ethical considerations in utilizing artificial intelligence for analyzing the nhgri’s history of genomics and human genome project archives.” they observe that, even if all data used is fully de-identified and there is minimal risk of harm, they are analyzing user data without explicit consent. this approach could be viewed as undermining subjects’ autonomy and a harm in and of itself. they mitigate this tension by handling the data as if the participants were being asked to participate in this new reality by conducting de-identification of the information used and rendering it to encoded data. likewise, wolff et al. in “‘i’ve got a feeling’: performing sentiment analysis on critical moments in beatles history,” acknowledge an ethical dilemma in using the work of journalists contained in a dataset with historical news articles about the beatles. specifically, it is unclear if this analysis falls under fair use when it is a part of a larger dataset. these questions about consent can help library and archives practitioners consider questions that may arise, despite little precedent in some of these arenas. accuracy several case studies highlight the ethical challenges created by the accuracy of the data. accuracy can be influenced by ai systems themselves (such as sentiment analysis tools) or can be influenced by elements of ai systems (such as ocr and named entity recognition). wolff et al. observe that the varying accuracy of ocr can have a ripple effect in “‘i’ve got a feeling’: performing sentiment analysis on critical moments in beatles history.” if the source data is inaccurate, it can cause further misinterpretation by the subsequent https://doi.org/10.7191/jeslib.860 https://doi.org/10.7191/jeslib.849 https://doi.org/10.7191/jeslib.849 https://doi.org/10.7191/jeslib.850 https://doi.org/10.7191/jeslib.850 https://doi.org/10.7191/jeslib.811 https://doi.org/10.7191/jeslib.811 https://doi.org/10.7191/jeslib.811 https://doi.org/10.7191/jeslib.811 https://doi.org/10.7191/jeslib.849 https://doi.org/10.7191/jeslib.849 https://doi.org/10.7191/jeslib.849 https://doi.org/10.7191/jeslib.849 journal of escience librarianship 13 (1): e860 | https://doi.org/10.7191/jeslib.860 e860/4 sentiment analysis tools used. also, these sentiment analysis tools have been trained on a specific form of writing found in social media and are not optimized to historical writing. anderson and duran describe challenges brought about when named entity recognition misattributes information to individuals in “responsible ai at the vanderbilt television news archive: a case study.” some of these concerns can be mitigated by a human review of the ocr and named entity recognition created before sentiment analysis tools are used. feng et al., suggest that human intervention can come into play in consideration of the quality and depth of information presented by various ai chatbots in “ethical considerations in integrating ai in research consultations: assessing the possibilities and limits of gpt-based chatbots.” librarians and others can provide guidance of what strengths and weaknesses might be found in the different bots so researchers are more aware of potential variation in results depending on the tool used. labor several case studies in this special issue discuss ethical issues relating to labor—both for library and archives employees and for student workers. pastva et al. discuss how ai could negatively impact library liaison services by reducing the amount of human interaction between library employees and library users in “the implementation of keenious at carnegie mellon university.” mcirvin et al. (“automatic expansion of metadata standards for historic costume collections”) and anderson & duran (“responsible ai at the vanderbilt television news archive: a case study”), whose case studies focus on ai for metadata enhancement, are concerned with how ai could affect the jobs of library employees with metadata and cataloging expertise. all of these authors suggest that ai should be used to augment, rather than replace library services. beyond displacement of workers and expertise, fair labor practices were also considered. beltran et al. touch on the ethics of student labor. in their case study, “open science recommendation systems for academic libraries,” they describe offering course credit to student workers in lieu of wages. to address this potential ethical challenge, beltran and colleagues worked with unpaid students to ensure that these students’ goals were being met—co-creating learning outcomes, and drafting a collaboration agreement between the students and the library. digital divide the cost of accessing ai tools can lead to a digital divide, as feng, wang, and anderson discuss in their case study “ethical considerations in integrating ai in research consultations: assessing the possibilities and limits of gpt-based chatbots.” as new ai-powered vendor tools are released, some may include free versions. but higher-quality, more accurate results are available to paid subscribers. this leads to a divide between people who can access high quality information and those who cannot. in 2012, boyd and crawford wrote about a divide between “the big data rich and the big data poor,” (2012, 674) and this divide continues to be a concern as new technologies are developed and turned into commercial products. something to watch for here are the subscription models for ai that are now coming into the market. in “the implementation of keenious at carnegie mellon university,” pastva et al. briefly touch on the pay https://doi.org/10.7191/jeslib.860 https://doi.org/10.7191/jeslib.805 https://doi.org/10.7191/jeslib.846 https://doi.org/10.7191/jeslib.846 https://doi.org/10.7191/jeslib.800 https://doi.org/10.7191/jeslib.845 https://doi.org/10.7191/jeslib.845 https://doi.org/10.7191/jeslib.805 https://doi.org/10.7191/jeslib.805 https://doi.org/10.7191/jeslib.804 https://doi.org/10.7191/jeslib.804 https://doi.org/10.7191/jeslib.846 https://doi.org/10.7191/jeslib.846 https://doi.org/10.7191/jeslib.800 e860/5 journal of escience librarianship 13 (1): e860 | https://doi.org/10.7191/jeslib.860 model for keenius, a vendor-provided resource recommender system. this seems likely to be an area where the digital divide will appear within libraries as communities with fiscal resources will be able to pay for personal recommendation bot assistants for their patrons while other communities will not have access to these technologies. bias several of the case studies in this special issue discuss potentially biased results and strategies to help reduce that bias. in “automatic expansion of metadata standards for historic costume collections,” mcirvin et al. suggest that subject heading biases may be present in the controlled vocabularies used by their ai model. to reduce potential bias, the team enlisted a domain expert to review automatically-generated metadata, and engaged with diverse metadata sources to avoid misleading or culturally-insensitive terms. pastva et al. (“the implementation of keenious at carnegie mellon university”) and beltran et al. (“open science recommendation systems for academic libraries”) discuss how recommender systems may be biased. pastva et al. point out that recommender systems and relevance ratings may show biases toward certain funding sources, legal jurisdictions, or countries of origin. they also note that students new to the research process might not be able to recognize bias when using new tools. beltran et al. interrogate the implications of persuasive design writ large, asking, “how can we build a model that does not invite the undue or unwanted influence of library services or introduce bias but ultimately is helpful and protects the users’ autonomy?” beltran et al. offer that one potential answer to this question is to enhance transparency—to design a recommender tool that explains why a recommendation was made. other researchers suggest that bias can also be found in the data and the training models used. wolff et al. worked on sentiment analysis of historical news about the beatles. in “‘i’ve got a feeling’: performing sentiment analysis on critical moments in beatles history,” they tested several models analyzing ocr text from historical newspaper reports. the group found the models adequate, but noted bias (and some false classifications) as the training data was from a current social media corpus that didn’t match historical uses of language in the newspapers. of note here: the need to use training data that matches the data to be analyzed. recognizing and anticipating bias in training data, and consequently, setting up custom training was proposed as a way to avoid bias. mcirvin et al. followed this solution in “automatic expansion of metadata standards for historic costume collections” by incorporating metadata terms using an inclusive description model to help remove bias in the generated subject terms. transparency a number of case study authors refer to transparency and explainability as core requirements for ai systems. in “open science recommendation systems for academic libraries,” beltran et al. connect the tension of proprietary recommendation systems and the lack of explainability for the recommendations as a cause of bias and distrust in the system. others (see “the implementation of keenious at carnegie mellon university”) go further in reviewing sources of the ai models and trying to draw out “algorithmic transparency” for https://doi.org/10.7191/jeslib.860 https://doi.org/10.7191/jeslib.845 https://doi.org/10.7191/jeslib.800 https://doi.org/10.7191/jeslib.804 https://doi.org/10.7191/jeslib.804 https://doi.org/10.7191/jeslib.849 https://doi.org/10.7191/jeslib.849 https://doi.org/10.7191/jeslib.845 https://doi.org/10.7191/jeslib.845 https://doi.org/10.7191/jeslib.804 https://doi.org/10.7191/jeslib.800 journal of escience librarianship 13 (1): e860 | https://doi.org/10.7191/jeslib.860 e860/6 patrons in their work on recommendation systems. guides that explain the decisions (algorithms) and data sources are suggested as a path towards building trust in the system. others define their moves toward responsibility and accountability as directives that indirectly create transparency. hosseini et al. see the responsible release of open-source code and explanations of how to use the code as part of this continuum of transparency in “ethical considerations in utilizing artificial intelligence for analyzing the nhgri’s history of genomics and human genome project archives.” conclusion the goal of this special issue is to provide examples of how practitioners can ethically and responsibly engage with ai tools and systems. the ethical issues raised in these case studies show that even as ai tools grow and change, our common professional values and ethical concerns as library and archives practitioners remain the same. we hope that when other practitioners read these case studies, they will be able to translate the ethical considerations and harm-reduction strategies in the case studies to their own work with ai. ­—special issue guest editors sara mannheimer, data librarian, montana state university doralyn rossmann, dean of the library, montana state university jason clark, head of the research optimization, analytics, and data services, montana state university yasmeen shorish, director of scholarly communications strategies, james madison university natalie bond, government information librarian and head of information & user services, university of montana hannah scates kettler, associate university librarian for academic services, iowa state university bonnie sheehey, associate director, center for science, technology, ethics, and society and assistant professor, history & philosophy, montana state university scott w. h. young, user experience and assessment librarian, montana state university acknowledgements we would like to thank the responsible ai project advisory board: dorothy berry, stephanie russo carroll, maría matienzo, bohyun kim, and thomas padilla. this project is made possible in part by the institute of museum and library services, through grant number lg-252307-ols-22. competing interests the authors declare that they have no competing interests. references boyd, danah, and kate crawford. 2012. “critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon.” information, communication & society 15 (5): 662–679. https://doi.org/10.1080/1369118x.2012.678878. https://doi.org/10.7191/jeslib.860 https://doi.org/10.7191/jeslib.811 https://doi.org/10.7191/jeslib.811 https://www.lib.montana.edu/responsible-ai/ https://www.imls.gov/grants/awarded/lg-252307-ols-22 https://doi.org/10.1080/1369118x.2012.678878 "i’ve got a feeling": performing sentiment analysis on critical moments in beatles history journal of escience librarianship 13 (1): e849 doi: https://doi.org/10.7191/jeslib.849 issn 2161-3974 full-length paper “i’ve got a feeling”: performing sentiment analysis on critical moments in beatles history milana wolff, university of wyoming, laramie, wy, usa, mwolff3@uwyo.edu liudmila sergeevna mainzer, university of wyoming, laramie, wy, usa kent drummond, university of wyoming, laramie, wy, usa abstract our project involved the use of optical character recognition (ocr) and sentiment analysis tools to assess popular feelings regarding the beatles and to determine how aggregated sentiment measurements changed over time in response to pivotal events during the height of their musical career. we used tesseract to perform optical character recognition on historical newspaper documents sourced from the new york times and smaller publications, leveraging advances in computer vision to circumvent the need for manual transcription. we employed state-of-the-art sentiment analysis models, including vader, textblob, and sentiwordnet to obtain sentiment analysis scores for individual articles (hutto and gilbert 2014; textblob, n.d.; baccianella, esuli, and sebastiani 2010). after selecting articles mentioning the group, we examined the changes in average sentiments displayed in articles corresponding to critical moments in the beatles’ musical career to determine the impact of these events. received: november 15, 2023 accepted: february 5, 2024 published: march 6, 2024 keywords: beatles, sentiment analysis, optical character recognition, historical newspaper archives, artificial intelligence, ai citation: wolff, milana, liudmila sergeevna mainzer, and kent drummond. “‘i’ve got a feeling’: performing sentiment analysis on critical moments in beatles history.” journal of escience librarianship 13 (1): e849. https://doi.org/10.7191/jeslib.849. data availability: github repository: https://github.com/wyoarcc/strawberryfields the journal of escience librarianship is a peer-reviewed open access journal. © 2024 the author(s). this is an open-access article distributed under the terms of the creative commons attribution-noncommercial-sharealike 4.0 international license (cc by-nc-sa 4.0), which permits unrestricted use, distribution, and reproduction in any medium for non-commercial purposes, provided the original author and source are credited, and new creations are licensed under the identical terms. see https://creativecommons.org/licenses/by-nc-sa/4.0. open access https://doi.org/10.7191/jeslib.849 mailto:mwolff3%40uwyo.edu?subject= https://doi.org/10.7191/jeslib.849 https://github.com/wyoarcc/strawberryfields https://creativecommons.org/licenses/by-nc-sa/4.0 journal of escience librarianship 13 (1): e849 | https://doi.org/10.7191/jeslib.849 e849/2 ai activity overview the advanced research computing center at the university of wyoming, where this study was conducted, maintains a wide variety of ongoing ai research projects. other applications of ai include the development of encoder/decoder and recurrent neural networks to predict the phylogenetic evolution and discover critical mechanisms of disease genomes such as colorectal cancer and covid-19; use of optical character recognition to process radiocarbon dating cards; real time image detection and annotation using yolov8 for applications including player tracking during sports events and tracking animals during clinical experiments. as a graduate research assistant, wolff is responsible for developing neural networks in the first project mentioned. as director of the advanced research computing center, sergeevna mainzer manages and coordinates all ai projects within the organization. drummond is an english professor, consulted for his expertise on the beatles aspect of the project and has no further affiliation with arcc or the ai activities conducted therein. summary this project focused on the use of artificial intelligence-enhanced language processing to extract the positive or negative valence of sentiments expressed in historical newspaper archives centered on coverage of the beatles music group over the course of their career. we utilized tesseract, an optical character recognition tool, to obtain the raw text from digitized copies of new york times articles and other publications from the adam matthew popular culture archives. we performed sentiment analysis on all articles within the dataset using three python-based natural language processing models. once we obtained positive and negative values for individual articles, we examined the articles with the strongest emotional language and determined which events in beatles history differed significantly from the general background sentiment expressed at the time. project details methodology we investigated whether different time periods corresponding to critical changes in the beatles’ career trajectory produced changes in public sentiment surrounding the group, with a particular focus on the release and legacy of the song “strawberry fields forever.” since the number of publications referencing the beatles extends far beyond the capacity of a human to read, we used sentiment analysis to highlight the greatest shifts in public sentiment and extract the most relevant articles for perusal. to define critical events in beatles history, we selected a number of important dates and segmented the dataset according to publication within the intervals between those dates. on august 12, 1960, the beatles adopted the name “beatles.” we consider this date the starting point for the beatles in their most identifiable form as a band. on october 17, 1962, the beatles appeared on television for the first time, marking their first major appearance in the public eye. on february 9, 1964, the beatles https://doi.org/10.7191/jeslib.849 e849/3 journal of escience librarianship 13 (1): e849 | https://doi.org/10.7191/jeslib.849 appeared on the ed sullivan show, catapulting the group more fully into the public consciousness, especially to international audiences. on july 29, 1966, an interview with john lennon, in which he claims the beatles are “more popular than jesus,” was republished for an american audience, drawing outrage from religious populations in the united states. on august 29, 1966, the beatles performed their final concert. on february 17, 1967, the two-sided single “strawberry fields forever” and “penny lane” was released. on april 10, 1970, the beatles formally disbanded. on december 8, 1980, lennon was assassinated in front of his residence at the dakota. at the end of august 1981, the strawberry fields memorial in central park was announced by lennon’s spouse, yoko ono. on october 9, 1985, the strawberry fields memorial was dedicated. we considered the articles published in the intervals between these dates for analysis. in order to obtain data for analysis, we selected two data sources. we retrieved all articles from the new york times digitized historical archive that referred to both the beatles and strawberry fields, as determined by keyword search. due to limitations of the database, and despite negotiations with both the database provider and the university of wyoming libraries, we were unable to acquire a bulk download of the archive. obtaining data from a variety of sources would have provided a more holistic view of popular sentiments towards the beatles. to supplement these articles, we obtained data from the 1950-1975 popular culture dataset consisting of magazine articles and newspapers provided by adam matthew (adam matthew digital 2023). this dataset was provided in xml format and the text from these items had already been extracted. since this dataset was both larger and had a wider scope, we relied more heavily on the popular culture archives than the new york times, from which we obtained a mere 159 usable articles. of the 6.3 million popular culture articles, 5.8 million contained usable information regarding publication date and were considered suitable for analysis. while the adam matthew dataset contained digitized text, the new york times dataset consisted of document scans of the original historical newspapers. we used optical character recognition to extract the text from these images. optical character recognition (ocr) describes the process of computationally identifying characters in handwritten or typed text, often sourced from historical archives without digitized counterparts. since the original documents cannot be searched, nor the text contents analyzed, without additional processing, we leveraged tesseract, an ocr engine developed in 1984 at hp labs and adopted by google in the early 2000’s (smith 2007). tesseract extracts text from scanned documents or photographs and returns the text in the form of computer-readable characters. the process involves a first stage of connected component analysis, wherein the program identifies the outlines of individual characters in the document. collections of outlines are organized into lines and regions of text. each region is further subdivided into words according to character spacing, and each word is passed to an adaptive classifier. a second pass may be completed depending on https://doi.org/10.7191/jeslib.849 journal of escience librarianship 13 (1): e849 | https://doi.org/10.7191/jeslib.849 e849/4 the confidence of the result. in this manner, tesseract produces a sequence of words matching the original document with relatively high accuracy depending on the quality of the original image (smith 2007). after executing tesseract on the new york times dataset, we performed sentiment analysis on the augmented dataset consisting of both adam matthew publications and the newspaper articles. we utilized three sentiment analysis packages with python implementations to conduct sentiment analysis on both datasets: the python natural language toolkit implementation of sentiwordnet and the python modules vader sentiment and textblob. sentiwordnet expands the princeton wordnet gloss corpus using a semi-supervised learning method based on the relationships between synonyms and antonyms. these sets of synonyms are called “synsets.” sentiwordnet uses a “bag of synsets” model, considering all synonyms used for terms in the text. the “bag of synsets” method expands on the older sentiment analysis “bag of words” model, which considers individual words in a document rather than their syntactic relationship. by determining the average sentiment assigned to terms used in a given document, we can obtain a single score for a given text (baccianella, esuli, and sebastiani 2010). vader is an acronym for valence aware dictionary and sentiment reasoner. vader is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. the algorithm incorporates word-order sensitive relationships between terms. for example, degree modifiers or intensifiers impact sentiment intensity by either increasing or decreasing the intensity (hutto and gilbert 2014). textblob works similarly to vader, and uses wordnet to account for negation, intensifiers, and negated intensifiers as well and averages across a given piece of text (textblob, n.d.). we obtained positive or negative values associated with each article in the dataset, representing the general valence of each text. we aggregated these texts over the time periods we defined and performed statistical analysis to determine if each time period differed from the subsequent time period, suggesting a public reaction to one of the critical events described above. we were able to determine which articles contributed most significantly to the overall sentiment of a given time period by selecting the maximally and minimally scored publications within each time frame. contributors contributors included milana wolff, a ph.d. candidate in computer science employed as a graduate research assistant at the advanced research computing center (arcc); kent drummond, a professor in the english department at the university of wyoming; liudmila sergeevna mainzer, the director of arcc; and chad hutchens, the chair of digital collections at the university of wyoming libraries. https://doi.org/10.7191/jeslib.849 e849/5 journal of escience librarianship 13 (1): e849 | https://doi.org/10.7191/jeslib.849 contributor roles sergeevna mainzer proposed collaboration between arcc employees and the humanities departments at the university, and drummond suggested the idea of using computational resources to better understand the beatles and the response from the population. the details of the data sources to be analyzed and the methods for analysis were developed with joint efforts from sergeevna mainzer, drummond, and wolff. hutchens provided access to the new york times historical database and obtained the adam matthew popular culture dataset. wolff organized, cleaned, and processed the data using tesseract, writing the entirety of the code for the optical character recognition pipeline used in this project. furthermore, wolff deployed existing natural language processing models and performed sentiment analysis and further statistical analysis on the dataset. wolff and sergeevna mainzer were responsible for developing the initial journal proposal, while wolff drafted the final version. services we utilized the services of coe libraries at the university of wyoming, in addition to computing time on the teton cluster (now retired) at the advanced research computing center. collections we used the new york times historical archive provided by proquest and the popular culture dataset provided by adam matthew. technologies & infrastructure we used tesseract ocr for the optical character recognition stage of the pipeline and the python natural language toolkit implementation of sentiwordnet, as well as the python modules vader sentiment and textblob, for sentiment analysis. we used basic statistical functions to conduct data analysis, and modules including pandas and matplotlib for data organization, cleaning, and visualization. challenges most challenges encountered in the course of implementation arose as the result of technical issues with different versions of tesseract dependencies and pre-existing installations on the computing cluster. obtaining and cleaning the raw data presented a challenge, especially since the proquest database limited download from the new york times historical archives, and errors in ocr propagated throughout the dataset. the formatting of the adam matthew dataset and inconsistent use of date conventions created further challenges when organizing a strongly time-dependent dataset. finally, performing sentiment analysis on a dataset containing several million articles is a resource-intensive endeavor, and small code errors often created much larger problems when applied to the entire dataset. https://doi.org/10.7191/jeslib.849 journal of escience librarianship 13 (1): e849 | https://doi.org/10.7191/jeslib.849 e849/6 background implementation decision spearheading a collaborative effort between the humanities departments at the university and the computational expertise and resources available, we decided to implement a project leveraging aspects of both domains. drummond, an english professor studying the beatles, proposed an investigation of historical documents. wolff and mainzer suggested sentiment analysis as a possible application of computational resources available. we implemented this ai-based research method to allow drummond and future researchers to understand the broader sentiments surrounding events and the changes in those sentiments as possible responses to crucial moments. furthermore, the sentiment analysis strategy we deployed allows researchers to not only understand the context of widespread popular sentiment, whether in general or in relation to particular keywords, but to extract articles or documents most responsible for influencing sentiment valence scores. in this manner, historical and popular culture researchers can avoid reading literal millions of articles and focus on the most emotionally biased among them to better glimpse the general sentiments displayed at the time. we obtain both aggregated and highly specific views of the same textual data without as much need for the tedious effort of inspecting, transcribing, and applying human interpretation to every document in a massive corpus. benefits as mentioned in the previous section, our approach combines ocr and sentiment analysis to enable historical researchers to minimize time spent on easily automated tasks such as transcription, keyword search, segmentation around particular dates, and identifying salient articles in a dataset. we allow researchers to focus instead on interpreting and analyzing the most critical documents and drawing more general conclusions based on the sentiment scores assigned to particular days, times, and document groupings. problems addressed we address one of the major issues facing researchers in many topics involving archival research: obtaining relevant documents to support an argument. by performing optical character recognition on digitized documents, we convert historical text into an easily searchable format. by performing a variety of sentiment analysis methods, we distill each source document into positive or negative valences, as well as providing a measure of subjectivity. we thus address the problem of manually searching massive archives for useful articles and instead allow researchers to narrow down their searches effectively. inspiration sentiment analysis exists across a variety of domains, from marketing research to musical analysis. we drew inspiration from previous work related to analyzing the music produced by the beatles and from the sentiment analysis domain as a whole. https://doi.org/10.7191/jeslib.849 e849/7 journal of escience librarianship 13 (1): e849 | https://doi.org/10.7191/jeslib.849 ethical considerations ethical considerations while our project relies primarily on publicly available historical documents, and therefore has negligible impact on current users, we acknowledge the inherent ethical concerns posed by any large-scale sentiment analysis and the application of what are often black-box models. ocr relies on predictive models built on the expectations of finding certain characters and words in text, and only produces text with 72-90% accuracy depending on how well the input data matches model expectations. the inaccuracy of the underlying ocr results impacts the sentiment analysis results, as certain words and their associated sentiments appear more frequently in the processed data than in the source material. when scaled to our dataset, including well over 5 million individual articles, inaccuracies accumulate and produce sentiment polarities inconsistent with the original data. drawing incorrect conclusions about the feelings of the general population based on inaccurate models alters how we perceive the past and our relation to historical events. likewise, sentiment analysis poses a number of ethical concerns. many modern sentiment analysis models are trained on data from social media websites. for example, vader was trained on data sourced from twitter users. the contrast between published historical writing and more casual modern writing can generate inaccurate scoring of sentiments in models attuned to one particular mode of communication. furthermore, quantifying sentiment as positive or negative obfuscates the emotions displayed (models may valuate both anger and sadness as negative). in losing granularity and context, such as distinctions between emotion directed towards individuals (anger at john lennon’s claim that the beatles were “more popular than jesus”) versus describing emotional events (sadness at lennon’s assassination), we risk misinterpreting and misrepresenting published opinions of individuals, potentially affecting the reputation of the writer or the subject. potential harms in misinterpreting the output of aggregated sentiment models, we risk drawing inaccurate conclusions about the social forces driving popular opinions, ultimately undermining our efforts. furthermore, sentiment analysis models aim to provide objective metrics on subjective data. the strength of the conclusions we draw and, ultimately, the way these conclusions reflect on the subjects and authors of the source material, depends not only on the accuracy of these models but the ability to capture nuance. for example, one of the most negatively rated articles in the dataset contains the words “strawberry fields,” but the article described detainees in an area of guantanamo bay known as “strawberry fields”—with the implication that these individuals would remain there “forever.” while the article provides excellent commentary on the influence of musical and artistic works on the world, the negative sentiment is wholly undirected towards the beatles. furthermore, citing this article absent context and explanation of the https://doi.org/10.7191/jeslib.849 journal of escience librarianship 13 (1): e849 | https://doi.org/10.7191/jeslib.849 e849/8 analysis methods used might reflect negatively on the article author as well as the beatles as a peripheral subject of this piece. privacy considerations as all training data for the sentiment analysis models used and all publications analyzed were available under fair use, and since analysis centered around public figures with limited expectations of privacy, most major privacy considerations did not factor into this project. however, historical newspapers were published before digitalization and large-scale analysis became acceptable methodologies for research. therefore, were the same analysis methods applied to non-public figures, privacy considerations such as the “right to be forgotten,” or excluded from computational analysis of available text data, would be required. user consent while we did not obtain the explicit consent of the journalists whose work we included in the dataset, publication in major media outlets such as the new york times grants some implicit consent for fair use, including reading, analysis, and reproduction under limited circumstances. however, whether availability for large-scale computational analysis falls under this domain remains an unresolved question. stakeholder engagement stakeholder engagement was not applicable to this project. existing documentation, policy, & best practices we followed general recommendations from the computer science and sentiment analysis communities when conducting this research. according to the acm code of ethics, “computing professionals should only use personal information for legitimate ends and without violating the rights of individuals and groups.” in a research context, using published works circulated in a public medium avoids many of the ethical considerations involved with more ambiguously public information, such as tweets or social media postings. we consider the advancement of understanding social trends a legitimate end for research. furthermore, data are considered in aggregate and are thus afforded a level of anonymization during the sentiment analysis process (acm, n.d.). in sentiment analysis communities, most existing recommendations surround the use of twitter and other social media data. researchers often discuss the need to minimize identification of specific individuals based on writing styles or direct quotations, the use of metadata surrounding text analysis (particularly on twitter or other online communities where geographic/location data becomes relevant), and whether explicit consent of users is required. at the time, these issues remain unresolved–and many publications leverage twitter data without seeking irb approval or the explicit consent of users. without a clear ethical framework to apply, and noting the vast differences between journalism pieces published in widely https://doi.org/10.7191/jeslib.849 e849/9 journal of escience librarianship 13 (1): e849 | https://doi.org/10.7191/jeslib.849 distributed newspapers and privately shared tweets, we proceeded with caution and by considering results primarily in aggregate (gupta, jacobson, and garcia 2007; takats et al. 2022; webb et al. 2017). ethical codes we referenced both the acm code of ethics and ethical considerations commonly discussed in sentiment analysis studies and derived general approaches from these sources. however, since we were conducting analysis of previously published newspaper articles, we did not follow a specific ethical code for interacting with the source material, as we were unable to find recommendations applying to our work exactly. risk-benefit analysis we considered the risks of large-scale analysis and the possibility of drawing erroneous conclusions; however, we also observed the benefits of unprecedented large-scale analysis in a frequently overlooked domain. as a precaution to avoid misinterpreting the results, we retained the original articles for human perusal rather than machine interpretation alone. user community & library concerns we discussed the project with members of the university of wyoming library and were met with enthusiastic feedback; no parties reported any concerns about the ethical uses of data. unresolved considerations we are not aware of any unresolved considerations at this time. impact at the present time, this project impacts the university of wyoming libraries, the advanced research computing center, and the english department at the university. the project has also been introduced to the community surrounding the university through an open technology forum (techtalk laramie). the libraries provided a text and data mining request to adam matthew, after which we collaborated with adam matthew to obtain ftp access to the dataset. initiating a dialogue with adam matthew and assisting with data acquisition paperwork proved instrumental to data acquisition for this project. the libraries also described earlier attempts to mass download from proquest and issues encountered as a result, such as the possibility of license suspension if we attempted this strategy without consulting proquest. this served as a deterrent from making an attempt to programmatically circumvent the download limit. this project impacted the libraries by fostering new connections with adam matthew and with university collaborators. arcc performed the analysis, and a member of the english department directed many aspects of this project. the ai implementation described above has fostered interdisciplinary collaboration and has provided valuable insights for this particular project domain. https://doi.org/10.7191/jeslib.849 journal of escience librarianship 13 (1): e849 | https://doi.org/10.7191/jeslib.849 e849/10 future work we plan to expand the scope of this project by introducing additional sentiment analysis methods, including a few packages associated with the language r, some of which offer finer-resolution sentiment analysis scores for emotions such as anger, fear, happiness, etc. we also plan to incorporate more data from a wider variety of sources for comparative analysis between coverage by the new york times and the publications in the adam matthew popular culture dataset with more conservative sources, such as the christian science monitor. in the future, ethical and responsible implementation of sentiment analysis methods and other forms of artificial intelligence will require a more robust interrogation of the training datasets as well as the datasets used in the project. sentiment analysis validity depends heavily on context, and can avoid detecting nuances such as historical shifts in language usage, sarcasm, and other literary devices employed in publications. it may be recommended to retrain or fine tune the models used on “background” literature from the same time periods, or engage in more in-depth human review of the validity of the scoring metrics. however, we believe our contribution represents a valuable advance in the field and a new approach to understanding the broad context and general sentiments related to historical events, allowing researchers to extract previously hidden trends. we recommend others pursuing similar work implement more sentiment analysis methods, including those with more robust or specifically selected training sets, use a wider cast of data sources, and consider additional expert review of some of the articles or publications within the dataset to verify the sentiment analysis metrics function as expected. documentation see the project github repository for python code used for data organization, cleaning, analysis, and visualization. data availability github repository: https://github.com/wyoarcc/strawberryfields. acknowledgements this research was supported by the advanced research computing center at the university of wyoming. the research case study was developed as part of an imls-funded responsible ai project, through grant number lg-252307-ols-22. competing interests the authors declare that they have no competing interests. https://doi.org/10.7191/jeslib.849 https://github.com/wyoarcc/strawberryfields https://github.com/wyoarcc/strawberryfields https://www.lib.montana.edu/responsible-ai/ https://www.imls.gov/grants/awarded/lg-252307-ols-22 e849/11 journal of escience librarianship 13 (1): e849 | https://doi.org/10.7191/jeslib.849 references adam matthew digital. 2023. “popular culture in britain and america, 1950-1975.” december 21, 2023. https://www.amdigital.co.uk/collection/popular-culture-in-britain-and-america-1950-1975. baccianella, stefano, andrea esuli, and fabrizio sebastiani. 2010. “sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining.” in proceedings of the seventh international conference on language resources and evaluation (lrec’10). european language resources association (elra), valletta, malta. http://www.lrec-conf.org/proceedings/lrec2010/pdf/769_paper.pdf. gupta, manisha, nathaniel jacobson, and eric k. garcia. 2007. “ocr binarization and image pre-processing for searching historical documents.” pattern recognition 40 (2): 389–397. https://doi.org/10.1016/j.patcog.2006.04.043. hutto, clayton j., and éric gilbert. 2014. “vader: a parsimonious rule-based model for sentiment analysis of social media text authors.” 2014. proceedings of the international aaai conference on web and social media 8 (1): 216–225. https://doi.org/10.1609/icwsm.v8i1.14550. smith, ray. 2007. “an overview of the tesseract ocr engine.” proceedings of the 9th international conference on document analysis and recognition (icdar 2007). curitiba, paraná, brazil, 629-633. https://doi.org/10.1109/icdar.2007.4376991. takats, courtney, amy kwan, rachel wormer, dari goldman, heidi e. jones, and diana romero. 2022. “ethical and methodological considerations of twitter data for public health research: systematic review.” journal of medical internet research 24 (11): e40380. https://doi.org/10.2196/40380. “the code affirms an obligation of computing professionals to use their skills for the benefit of society.” n.d. http://www.acm.org/about-acm/acm-code-of-ethics-and-professional-conduct. “tutorial: quickstart — textblob 0.18.0.post0 documentation.” n.d. https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis. helena webb, marina jirotka, bernd carsten stahl, william housley, adam edwards, matthew williams, rob procter, omer rana, and pete burnap. 2017. “the ethical challenges of publishing twitter data for research dissemination.” in proceedings of the 2017 acm on web science conference (websci ‘17). association for computing machinery, new york, ny, 339–348. https://doi.org/10.1145/3091478.3091489. https://doi.org/10.7191/jeslib.849 https://www.amdigital.co.uk/collection/popular-culture-in-britain-and-america-1950-1975 http://www.lrec-conf.org/proceedings/lrec2010/pdf/769_paper.pdf https://doi.org/10.1016/j.patcog.2006.04.043 https://doi.org/10.1609/icwsm.v8i1.14550 https://doi.org/10.1109/icdar.2007.4376991 https://doi.org/10.2196/40380 http://www.acm.org/about-acm/acm-code-of-ethics-and-professional-conduct https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis https://doi.org/10.1145/3091478.3091489 open science recommendation systems for academic libraries journal of escience librarianship 13 (1): e804 doi: https://doi.org/10.7191/jeslib.804 issn 2161-3974 full-length paper open science recommendation systems for academic libraries lencia beltran, carnegie mellon university, pittsburgh, pa, usa, lbeltran@andrew.cmu.edu chasz griego, carnegie mellon university, pittsburgh, pa, usa lauren herckis, carnegie mellon university, pittsburgh, pa, usa abstract an interdisciplinary academic team offers a comprehensive case study describing the development of a predictive model as the cornerstone for an open science recommendation system tailored to the carnegie mellon university community. this initiative will empower users in choosing open science services that align with their academic requirements, introduce academics to resources they find valuable, and bridge gaps within academic library service offerings. as an institution with a longstanding commitment to a science-informed approach and a focus on computer science, engineering, and artificial intelligence, carnegie mellon university has enthusiastically embraced open science practices. the carnegie mellon university’s libraries has been instrumental in bringing these practices into our academic landscape. received: october 29, 2023 accepted: february 5, 2024 published: march 5, 2024 keywords: open science, artificial intelligence, ai, recommendation system, higher education, academic library services, ethical considerations citation: beltran, lencia, chasz griego, and lauren herckis. “open science recommendation systems for academic libraries.” journal of escience librarianship 13 (1): e804. https://doi.org/10.7191/jeslib.804. data availability: beltran, lencia, chasz griego, and lauren herckis. 2024. “open science recommendation systems for academic libraries.” osf. https://doi.org/10.17605/osf.io/px6hj. the journal of escience librarianship is a peer-reviewed open access journal. © 2024 the author(s). this is an open-access article distributed under the terms of the creative commons attribution 4.0 international license (cc-by 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. see http://creativecommons.org/licenses/by/4.0. open access https://doi.org/10.7191/jeslib.804 mailto:lbeltran%40andrew.cmu.edu?subject= https://doi.org/10.7191/jeslib.804 https://doi.org/10.17605/osf.io/px6hj http://creativecommons.org/licenses/by/4.0 https://orcid.org/0009-0001-3333-8919 https://orcid.org/0000-0002-2051-7491 https://orcid.org/0000-0002-3177-2412 journal of escience librarianship 13 (1): e804 | https://doi.org/10.7191/jeslib.804 e804/2 abstract continued the authors strive to develop a predictive model which will evolve into a recommendation system. the pursuit of this endeavor has led the authors through several ethical considerations, such as data privacy, the involvement of student contributors, and the design of a persuasive recommendation system. we are committed to exploring ethical approaches for delivering user-centered recommendations and to preserving individual autonomy. the authors have actively engaged with diverse academic departments, students, and faculty, embarking on data exploration, and applying open science principles throughout the process. the resulting system will raise awareness of library services and deliver tailored recommendations for the adoption of proven research tools and practices. this case study serves as an exemplar of how universities can enact open science principles and develop systems that prioritize the user’s interests, navigate institutional complexities to forge interdisciplinary collaboration, and muster resources to support innovative, multi-disciplinary efforts. introduction a carnegie mellon university team aims to build a predictive model to act as a foundation for the development of an open science recommendation system for the campus community. this model will employ user characteristics to identify services that are a good fit for users’ academic needs from a universe of the library’s open science resources. developing this model will help us understand which users engage with each service and identify potential users who would benefit from additional open science resources. the recommendation system will introduce naive users to services they are likely to find valuable and, in parallel, introduce current users to alternative capabilities that are likely to be of value. at the same time, our team will consider and speak to the ethical implications of generating a novel system of this kind in an academic setting. this system will also serve as a proof of concept for other academic library service recommendation systems. project details carnegie mellon university (cmu) is a private, global research university that has championed a scienceinformed approach for more than five decades and is consistently ranked among international leaders in computer science, engineering, and artificial intelligence. nobel laureate, turing award winner and father of artificial intelligence herb simon created interdisciplinary pathways for research and innovation that are still characteristic of carnegie mellon university’s unique academic ecosystem today. today, carnegie mellon university offers degrees with a focus on artificial intelligence at the bachelors, masters, and doctoral levels. a transdisciplinary effort focused on artificial intelligence, cmu-ai unites students, faculty, and https://doi.org/10.7191/jeslib.804 e804/3 journal of escience librarianship 13 (1): e804 | https://doi.org/10.7191/jeslib.804 staff from all areas of the university to engage with complex challenges and to partner with corporations, non-profits, and research institutions around the world. in addition to extensive research and development efforts, cmu actively fosters entrepreneurship. this has resulted in a number of spin-off corporations which embed cmu-developed, ai-enabled innovations ranging from self-driving cars to smart textbooks. this case study is authored by the project team, lencia beltran, chasz griego, and lauren herckis. lencia beltran is the open science program coordinator for carnegie mellon university libraries open science program. her educational background is in linguistics, speech-language pathology, and librarianship and archival studies. beltran received training in data science from drexel university’s leading and ai applications from the idea institute on ai. her research falls within the spheres of ai, geospatial mapping, social networks, and implications of technology on language, including diversity, identifying, and belonging in higher education and academia. for this project, beltran supported project establishment, initiation and management, research design, documentation, and the building of institutional collaborations. dr. chasz griego is a science and engineering liaison librarian and formerly an open science postdoctoral associate at the carnegie mellon university libraries. his educational background is in chemical engineering, with a focus in computational chemistry and catalysis. his doctoral work focused on physical models coupled with machine learning to expedite catalyst screening projects. his research focuses on the influence of open science tools on reproducibility in computational research related to ai, simulations, and modeling. for this project, dr. griego supported research design, data curation, and technical recommendations for model development. dr. lauren herckis is an anthropologist by training and has a faculty appointment in the university library and the school of computer science’s human-computer interaction institute. her research explores the adoption and use of ai-augmented and collaborative learning tools, the digitalization of higher education, and the design of tools to help faculty employ effective technology-enhanced learning tools with fidelity. for this project, dr. herckis supported research design and data analysis, and co-developed strategies for tool deployment, data curation, service delivery, and evaluation, as well as facilitating partnerships with institutional collaborators. background carnegie mellon has championed a science-informed approach for more than five decades and is committed to designing and facilitating transformative educational experiences, accelerating research and creative inquiry, developing innovative library infrastructure, and evolving to enable students, staff, and faculty to discover, access, and use scholarly information. core project team members are affiliated with the university library and have a professional interest in enhancing library services. carnegie mellon has invested in an open science program in recent years, and project personnel are both personal and professional champions of open science practices. the proposal of a recommendation system derived from the idea to create a predictive model that would shed light on usage patterns of open science services. in 2021, members of the open science team ran an https://doi.org/10.7191/jeslib.804 journal of escience librarianship 13 (1): e804 | https://doi.org/10.7191/jeslib.804 e804/4 analysis to evaluate the program’s impact, using data collected over the span of two years, which included service offerings and, in most cases, high-level user information like departmental affiliation. the analysis of the preliminary data showed the effectiveness of the program in the campus community, yet it did not provide further details on why users opted for specific services and their motivations behind those decisions. in an effort to understand our users and their motivations, we pursued the next step to build a predictive model. the predictive model will not only give us insight into these essential areas but has the potential to identify probable user groups who would benefit from open science resources. the findings from the predictive analysis can lead to discussions on where gaps in our services lie in terms of which departments are not using our services so that we can begin to develop resources to meet the needs of those departments. the inspiration for developing a recommendation system arose as we considered strategies for simplifying the discovery process of our services to users. the recommendation system built from this predictive model will extend how the open science program delivers services and how users discover these services. this recommendation system will facilitate how information is accessed/retrieved and alleviate information overload that students, as well as faculty and staff, may feel as a product of having too many options and not knowing which services will be the most helpful. the mechanics of the recommendation system will work similarly to other well-known models, like pinterest, amazon, and netflix, by providing users with a curated list of results. how the services are delivered to users is an aspect we are thinking through. as we move forward and look to other projects for guidance, we intend to keep our users at the center of each approach. members of the open science program in the university libraries have spearheaded the effort to implement this recommendation system. the libraries serve the efforts of the university to continually innovate education and research by supporting the curriculum as well as faculty and student research. one area in which the libraries are leading is in innovation around open science, a fairly new concept in the united states. our open science program has helped propel the integration of many open science elements into the education landscape of our community. this team is composed of several faculty and staff in the libraries, and the members of this program who are actively involved in this project include lencia beltran, the open science program coordinator and chasz griego, a science and engineering librarian who was formerly an open science postdoctoral associate. along with these associates, lauren herckis, an anthropologist and affiliate of the libraries, simon initiative, and the human computer interaction institute at cmu, has contributed in our efforts to identify collaborators and develop strategies to assess how users will engage with open science service and tool delivery in educational and research settings. we also recruited an undergraduate student, zhijin wu, majoring in information systems, human-computer interaction, and business administration, who is volunteering their efforts as a project manager and coordinator to gain academic and professional experience. https://doi.org/10.7191/jeslib.804 https://www.library.cmu.edu/about/partnerships e804/5 journal of escience librarianship 13 (1): e804 | https://doi.org/10.7191/jeslib.804 the open science program offers many resources, which include data and research consultations and libraries workshops in data and software carpentries. along with these resources, our patrons also have access to several tools and platforms that facilitate open science practices. these include kilthub, our institutional repository, labarchives, a platform for electronic lab notebooks, protocols.io, a repository for sharing records of research methods, and open science framework, a platform for collaborative research management. with such resources and tools in place, open science program personnel have initiated an ambitious plan to amplify engagement at carnegie mellon. in 2021, the cmu open science team, including past and present members, explored and gathered user data dating back to 2019 of over 900 users at cmu that counted usage and numbers of items uploaded on digital platforms as well as interactions with our other tools and services (wang et al. 2019). this dataset has driven initial insight into our efforts to establish an open science recommendation system in the libraries. in the spring 2023 semester, we partnered with a faculty-led team of four master of statistical practice (msp) capstone students who agreed to undertake data exploration and develop a proof-of-concept ai-enabled predictive model that would identify likely use cases for open science tools and resources at cmu as a master’s degree capstone project. currently, this team has described the distribution of past open science tool users among schools and departments at cmu, with faculty and ph.d. students being the most common academic positions held by users. as these efforts continue, this team is helping us identify the features of our current dataset that will provide statistically significant predictions. before establishing this partnership, we requested, as one of the deliverables, a written report describing the development, decisionmaking process, outputs, and findings so that we, including others, could reproduce their work. in keeping with our posture of openness, we also asked the team to apply open science practices such as reproducible tools and code and version control. the students have since shared their research materials, including documentation of data, code, and analysis using open platforms like open science framework. in addition to the existing usage data for open science tools and services, a subset of the metrics data for students along with de-identified demographic information from the cmu registrar, and tartandatasource (tds) from the university institutional research and analysis office were used in the analysis, all of which can be shared. as library and information professionals, among other titles, we understand the significance of an individual’s right to privacy and, as we carry on, preserving this right will be at the forefront of our mind. as long as our research materials do not contain personally identifiable information that is unable to be anonymized, our team plans to share any code and scripts, documentation, and other information openly since one of our objectives is for this system to serve as a blueprint for other academic libraries. in order to design service models that accommodate recommendations from our system and accurately meet the needs of the campus community, our team is also investigating perspectives from researchers and implementing open science tools into educational settings. through 2023, the libraries open science program is conducting a needs assessment and environmental scan that includes focus group interviews of research with diverse areas of study and identifying open science services and practices among peer https://doi.org/10.7191/jeslib.804 https://kilthub.cmu.edu/ https://library.cmu.edu/services/lab-archives https://www.library.cmu.edu/services/protocols https://www.library.cmu.edu/services/open-science-framework https://osf.io/a87h2/?view_only=ca5fc38487184144bacd8351f149adec journal of escience librarianship 13 (1): e804 | https://doi.org/10.7191/jeslib.804 e804/6 institutions and other units at cmu. in summer 2023, chasz griego will lead an eight-week undergraduate course, hosted through the office of the vice provost for education, delivering opportunities for students to use open science tools and assess how these tools influence collaboration and reproducibility in research. this course will serve as a testing space to investigate how open sciences tools, and practices, can be implemented in practice within educational settings at cmu. our findings from these assessments and collaborations will aid in guiding us to develop a proof-of-concept recommendation system that will introduce naive carnegie mellon university students, faculty, and staff to existing open science tools and resources. developing a predictive model and associated recommendation system is a novel approach to scaling adoption of, and engagement with, academic tools at a university like carnegie mellon. eventually, we will need to instantiate a functional version of the predictive model-driven recommendation system so that it can begin effectively delivering resource recommendations to educators. the work proposed here will meet this substantial challenge and make successful implementation possible. ethical considerations already within our preliminary exploration, our team has identified ethical challenges that relate to developing both a predictive model and a recommendation system in three areas. first, many of our concerns connect to users’ rights, such as using personally identifiable information and privacy. second, this project leverages student labor in exchange for academic credit and learning opportunities. finally, this project is designed to promote specific tools and practices through persuasive design. data collection for our system will include personally identifiable information that can be used to describe patron behavior with respect to library tools and services. generally speaking, predictive and recommendation systems take information about a user’s preferences as input and predict an output of an item that is likely to meet the user’s needs. as a result of the underlying nature, the collection and curation of vast amounts of personal information are inevitable for generating personalized recommendations. on the surface, these systems appear to be user-centered, because they generate curated content, but many of them are driven by business objectives and applications. consequently, this leads to less consideration of the user and their privacy. more often than not, user data is being collected and analyzed without the consent or knowledge of the user. if users are aware data is being collected, then it is likely they do not understand its actual or intended uses. in our pursuit, we are seeking approaches that will allow us to design a recommendation system that curates open science resources, takes into account the users’ rights, and carefully balances the risks of user privacy and accuracy, as well as fairness and explainability without merely shifting the responsibility to the users. for example, a solution might be to embrace a macro-ethical approach which considers ethical problems related to data, algorithms, and practices and how the problems relate, depend on, and impact each other (milano, taddeo, and floridi 2020). https://doi.org/10.7191/jeslib.804 e804/7 journal of escience librarianship 13 (1): e804 | https://doi.org/10.7191/jeslib.804 this project currently entails collaboration with students individually and in teams and will likely expand in the coming year to include other graduate and undergraduate students working for credit, hourly, and for free. students regularly engage in generative activity as part of their educational experiences, and there is substantial literature about the effective design of capstone courses (tenhunen et al. 2023) and recent literature also addresses the ownership and intellectual property associated with work that students have completed in the course of their education (allen 2021). as this case study is being written, an undergraduate triple-major in information systems, human-computer interaction, and business administration is volunteering approximately five hours per week to serve as a project manager on this project. this student has a background in ai research and development and an interest in developing project management skills. in order to ensure that this student is gaining useful professional and/or academic experience, we worked with her to identify learning outcomes and desired skills and to agree on mutually agreeable communication and collaboration strategies. we have asked her to create project plans and visualizations, such as gantt charts, maintain records and manage communications. in order to ensure that data and products of work are handled ethically, we used a collaboration agreement and discussed the need for explicit communication about future use of project assets. we expect that she will use visualizations and other assets as part of her portfolio. this student will gain substantial educational benefit through the hands-on learning experiences that our collaboration requires. the project will gain several durable assets which will outlast the student’s collaboration on the project. this project was integrated into the master of statistical practice (msp) graduate curriculum in the spring of 2023. a student team working under faculty supervision undertook data exploration, developed project constraints and documentation, and built a proof of concept predictive model that met our specifications. the project team is positioned as a client, and related student efforts will be evaluated and graded as a capstone project to meet masters degree requirements. faculty associated with the capstone course and project will guide student work and frame the experience to best serve students’ educational goals. while these efforts represent curricular and educational benefits to the students, they can also be understood as an appropriation of student labor to produce university assets. development of a predictive model is a non-trivial task which requires substantial investment of time and resources. these resources are being extracted from students as a component of requirements for degree completion and can be understood as appropriation of student labor. following the pattern of other recommendation designs, our system seeks to help users discover new services and minimize the cognitive information overload that exists in academic settings. yet we are grappling with the inherent persuasive design of recommendation models. how can we build a model that does not invite the undue or unwanted influence of library services or introduce bias but ultimately is helpful and protects the users’ autonomy? fortunately, there are a number of different approaches we can explore for building a recommendation system, yet they all generally involve constructing a user model or profile. a user profile is a set of characteristics and/or preferences for a given user and is used by the https://doi.org/10.7191/jeslib.804 journal of escience librarianship 13 (1): e804 | https://doi.org/10.7191/jeslib.804 e804/8 system to make personalized recommendations. although we do want users to receive curated services, these constructions can limit the range of options recommended to users as it places them into categories (e.g., department, academic level, etc.). as a result of this algorithmic classification, an individual’s ability to make self-driven and reflected decisions on which services are extended to them is hindered, and ultimately users are nudged toward a particular outcome. being that the area of artificial intelligence is still in its infancy, there is extensive research that explores the ethicality of recommendation systems and their effect on citizens as a by-product. some research suggests strategies to assuage bias and quell autonomy issues, such as deploying a conversational recommender system that provides users with explanations for why a particular recommendation was made (musto et al. 2019). musto et al. (2019) found the selections were better received when explanations were provided to users. this general description illustrates how these systems can potentially shape an individual’s experience of the digital world. as echoed throughout, our intent is to aid in facilitating the discovery of these tools within our community, which may be beneficial for their academic and personal goals, rather than participate in influencing the choices or altering the perception of what services are readily available to them. at this current stage, we have more questions than answers. the ethical challenges presented here, including others we encounter as we move forward, shape how we approach conversations with individuals who have experience with designing and deploying predictive and recommendation systems. who is affected by this project? over the course of the project, we expect many individuals, services, and programs to be involved to some degree. the libraries dean has been the essence of support for open science practices in our community, including this project. in 2018, he endorsed the development of the open science and data collaborations program spearheaded by three library liaisons. as the highest level of support within the libraries, he has helped advocate for the many open science resources we offer and paved the way for our open science team to hold discussions and establish relationships with deans from the schools/colleges. through these many conversations, we have already seen an uptick in interest in open science practices and services from disciplines (e.g., language technologies institute and statistics) across campus. since the open science program was initiated by three library liaisons it has helped increase internal support from other library service providers (e.g., functional specialists, liaisons, and staff ). the support from library service providers has been valuable for raising awareness about our services and building internal and external partnerships as each person engages with distinct departments and individuals on campus. as implied to some degree, the open science team embodies a range of specialists whose work is diverse. our team includes a staff who manages and supports the institutional repository, a functional specialist who provides training and support on data curation and literacy, and three library liaisons who support tools and provide training on topics related to open access, data management, scholarly communications, and more. their involvement in our project is indirect but essential for the ongoing success of the program https://doi.org/10.7191/jeslib.804 e804/9 journal of escience librarianship 13 (1): e804 | https://doi.org/10.7191/jeslib.804 and the resources we can offer our community. our project has many moving parts, including partnering with faculty and educators to integrate our services into their curriculum, which will discursively involve the students who engage with the open science resources through their professors’ class transformation. altogether our project can increase awareness of library services, begin to glean user motivations, and open up new approaches for gathering information on how users practice open science at an institution and measure satisfaction/success with each tool. as our project unfolds, there are potential opportunities to partner with university service areas, including the student academic success center and the eberly center for teaching and excellence. the student academic success center facilitates student learning by providing academic coaching, subject-specific tutoring, effective communication strategies, accommodations for students with disabilities, and language support for multilingual learners. the eberly center supports faculty, graduate students, and other educators that aim to design courses and curricula that put students at the center of the teaching process. our effort will enhance our understanding of carnegie mellon library service use and provide more effective open science support to the carnegie mellon community. more broadly, this project can serve as a proof of concept for other academic libraries, which we hope will build on our work. lessons learned and future work while this project is still in preliminary stages and much of the work is ongoing, we have already learned several lessons to improve our approach to create a robust recommendation system while considering the ethics of data usage and implementation of ai. many of these lessons were learned through our work with the master of statistical practice (msp) capstone team. collaborations with statistics faculty and students first revealed challenges for us to communicate our goals and intentions in a way that aligns with the knowledge base of these subject specialists. we chose terminology that helped translate our goals into action items that could reasonably be executed by statistic students. however, we did observe some disconnects with information exchange. for instance, the students treated the variables in our dataset as arbitrary. when analyzing trends such as academic departments that are more likely to use our institutional repository, kilthub, the students would tend to focus on how these trends contribute to model parameters and not question the reasoning behind why a certain department would be more drawn to kilthub. these points were usually addressed during team meetings where the faculty adviser led the efforts to ask the more subject-specific questions. overall, the statistics students successfully applied their education to real-world problems and data, but the team encountered challenges connecting the data analysis to the context of the problems that were specific to the university libraries. the master of statistical practice (msp) capstone team delivered preliminary results that signaled a need for a larger dataset so that the predictive model can deliver results with higher confidence. challenges arose when considering ways to obtain expansive data that represents library users at cmu. a major consideration is the privacy of individuals currently or previously affiliated with the university. while our aim is to develop https://doi.org/10.7191/jeslib.804 journal of escience librarianship 13 (1): e804 | https://doi.org/10.7191/jeslib.804 e804/10 user profiles that describe prior or potential attraction to open science practices, we plan to only rely on user information that is publicly or internally available. however, there are challenges specific to library user data. within the rights of the fourth amendment of the united states constitution, the privacy of patrons that access information from libraries is protected. for instance, these laws protected patrons from seizures of library records from the federal bureau of investigation under the usa patriot act. appropriately, the cmu libraries does not record user-specific circulation information, which teaches us that a predictive model built for library services cannot be established with such data, but other, carefully considered records that describe academic behavior and motivations. future work will address approaches to evaluate the effectiveness of recommendations to users. we will develop strategies to assess user responses to the recommended tools or services and how they influence research and/or educational outcomes. this will include developing metrics to measure how the recommendation system supports decision making. to evaluate the performance of decisions, we can refer to the strategies outlined by jameson (2015) that identify choice patterns based upon attributes, consequences, experience, social conditions, policies, or trial-and-error (jameson et al. 2015). we will survey responses from users in a variety of settings including electronic surveys, focus groups, and case studies. in case studies we will analyze changes in educational outcomes in academic courses that incorporate recommended open science tools and services. data availability many of the materials mentioned within the case study can be found on our open science framework project, open science recommendation systems for academic libraries (beltran, griego, and herckis 2024). please reach out to our team if you have any questions. acknowledgements the research case study was developed as part of an imls-funded responsible ai project, through grant number lg-252307-ols-22. competing interests the authors declare that they have no competing interests. references allen, genevera. 2021. “experiential learning in data science: developing an interdisciplinary, clientsponsored capstone program.” proceedings of the 52nd acm technical symposium on computer science education, march, 516–522. beltran, lencia, chasz griego, and lauren herckis. 2024. “open science recommendation systems for academic libraries.” osf. https://doi.org/10.17605/osf.io/px6hj. https://doi.org/10.7191/jeslib.804 https://doi.org/10.17605/osf.io/px6hj https://www.lib.montana.edu/responsible-ai/ https://www.imls.gov/grants/awarded/lg-252307-ols-22 https://doi.org/10.17605/osf.io/px6hj e804/11 journal of escience librarianship 13 (1): e804 | https://doi.org/10.7191/jeslib.804 jameson, anthony, martijn c. willemsen, alexander felfernig, marco de gemmis, pasquale lops, giovanni semeraro, and li chen. 2015. “human decision making and recommender systems.” in recommender systems handbook, edited by francesco ricci, lior rokach, and bracha shapira, 611–648. boston, ma: springer us. https://doi.org/10.1007/978-1-4899-7637-6_18. milano, silvia, mariarosaria taddeo, and luciano floridi. 2020. “recommender systems and their ethical challenges.” ai & society 35 (4): 957–967. https://doi.org/10.1007/s00146-020-00950-y. musto, cataldo, fedelucio narducci, pasquale lops, marco de gemmis, and giovanni semeraro. 2019. “linked open data-based explanations for transparent recommender systems.” international journal of human-computer studies 121 (january): 93–107. https://doi.org/10.1016/j.ijhcs.2018.03.003. tenhunen, saara, tomi männistö, matti luukkainen, and petri ihantola. 2023. “a systematic literature review of capstone courses in software engineering.” arxiv. http://arxiv.org/abs/2301.03554. wang, huajin, melanie gainey, patrick campbell, sarah young, and katie behrman. “implementation and assessment of an end-to-end open science & data collaborations program [version 2; peer review: 2 approved]. f1000research 2022, 11:501. https://doi.org/10.12688/f1000research.110355.2. https://doi.org/10.7191/jeslib.804 https://doi.org/10.1007/978-1-4899-7637-6_18 https://doi.org/10.1007/s00146-020-00950-y https://doi.org/10.1016/j.ijhcs.2018.03.003 http://arxiv.org/abs/2301.03554 https://doi.org/10.12688/f1000research.110355.2 automatic expansion of metadata standards for historic costume collections journal of escience librarianship 13 (1): e845 doi: https://doi.org/10.7191/jeslib.845 issn 2161-3974 full-length paper automatic expansion of metadata standards for historic costume collections caleb mcirvin, virginia polytechnic institute and state university, blacksburg, va, usa chreston miller, virginia polytechnic institute and state university, blacksburg, va, usa, chmille3@vt.edu dina smith-glaviana, virginia polytechnic institute and state university, blacksburg, va, usa wen nie ng, virginia polytechnic institute and state university, blacksburg, va, usa abstract objective: this project focuses on artificial intelligence (ai) supported enhancement of descriptive metadata for fashion collections (otherwise known as costume or dress and textile collections) through expanding costume-specific controlled terms. the authors use natural language processing (nlp) techniques along with a human-in-the-loop process to support selection of descriptive terms for inclusion in the controlled terms of a metadata schema. this project seeks to expand upon existing domainspecific schema, costume core, by enhancing the schema with a comprehensive set of descriptors. this enhancement will allow for more accurate and detailed descriptions of costume artifacts. this article describes this process and the outcomes of ai approaches for providing this metadata expansion, who this process is for, ethical considerations, and lessons learned. received: november 15, 2023 accepted: february 5, 2024 published: march 6, 2024 keywords: natural language processing, library, word embeddings, metadata, costume collections, artificial intelligence, ai citation: mcirvin, caleb, chreston miller, dina smith-glaviana, and wen nie ng. 2024. “automatic expansion of metadata standards for historic costume collections.” journal of escience librarianship 13 (1): e845. https://doi.org/10.7191/jeslib.845. data availability: the data presented in this study are available on request from the corresponding author. the journal of escience librarianship is a peer-reviewed open access journal. © 2024 the author(s). this is an open-access article distributed under the terms of the creative commons attribution 4.0 international license (cc-by 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. see https://creativecommons.org/licenses/by/4.0. open access https://doi.org/10.7191/jeslib.845 mailto:chmille3%40vt.edu?subject= https://doi.org/10.7191/jeslib.845 https://creativecommons.org/licenses/by/4.0/ https://orcid.org/0000-0003-4276-0537 journal of escience librarianship 13 (1): e845 | https://doi.org/10.7191/jeslib.845 e845/2 abstract continued methods: we approached our problem with an investigation into using word embeddings to aid in supporting the suggesting of new metadata terms. several word embedding models were applied with the more descriptive one chosen for final use in a human-in-the-loop selection process. this selection process provided domain experts to identify which terms chosen by the model are of relevant value. we then compare what was chosen by the domain experts and what the model produced to get an idea as to the value the model provides in the process of metadata expansion. results: the metadata expansion process was a success. an ai supported process aided domain experts in choosing relevant terms to include in their metadata schema. therefore, the results were a methodology for using identified ai models for the problem, an interactive system to aid the domain experts (software system), and how to evaluate the results. conclusion: the application of ai technologies (word embeddings) provided a successful pipeline for supporting domain experts to expand the metadata schema with additional descriptors. enhancing the metadata schema with additional descriptors improves its usability for fashion collection managers and allows for a more precise description of the artifacts. as a result, many new terms that were expertly chosen were added to the metadata schema. introduction we postulate that problems in cataloging efforts in the historical costuming domain can be mitigated through implementation of a standardized metadata schema. existing metadata schemas that utilize controlled descriptive terminology for fashion artifacts, such as historic costume or dress, and items related to the process and product of dressing the body (eicher and evenson 2014), which include clothing, textiles, and accessories, are often constrained by the insufficient number of description fields and a limited vocabulary set. by expanding the number of terms available using natural language processing methods, we can develop high-quality, consistent metadata enabling better data sharing across collections and increasing cataloging accuracy, resulting in improved dress record searchability. confirming generated descriptor choices via a human-in-the-loop approach allowed us to alleviate ethical concerns surrounding the quality of the descriptors chosen for our updated metadata schema. this approach also allowed us to ensure our descriptor selections for the updated schema drew from an ethnically diverse array of sources while avoiding misleading or culturally insensitive terms. additionally, we emphasized adding terms that would allow for inclusive language, for example enhancing the use of colloquial terms such as “robe,” which could refer either to “bathrobe” or traditional chinese robes, by adding more precise terms. https://doi.org/10.7191/jeslib.845 e845/3 journal of escience librarianship 13 (1): e845 | https://doi.org/10.7191/jeslib.845 project details this project was a collaboration between the virginia tech fashion merchandising and design (fmd) department and virginia tech university libraries. an expert in dress and students from the fmd department participated in selecting terms. the expert supervised the student assistants and was the final decision-making authority on choosing the descriptors used to expand the metadata schema. the data and informatics consultant within the university libraries worked with an undergraduate computer science student to develop the natural language processing (nlp) approach and provide analysis of the resulting chosen descriptors. our dress domain expert collaborated with our digital collections specialists within the university libraries to support the process of finalizing the expanded metadata schema. the collection used for this work was the oris glisson historic costume and textile collection. the collection was a suitable collection since the dress domain expert had previously cataloged the items using costume core (kirkland 2018), the metadata schema our project strives to expand. to aid in this expansion, we used a pretrained nlp model to generate word embeddings. these word embeddings can be used to identify descriptors that have conceptually similar meanings but also can identify descriptors that are a little conceptually “further” away. this also allows us to introduce descriptor diversity and explore terms that were previously not present in the schema. to help with sifting through the identified potential descriptors, we used a lightweight server to host an application we developed to aid in sifting through nlp suggested descriptors. the main challenge we encountered during our pipeline was how to best share the nlp results with the students and the dress domain expert for selection and confirmation. we additionally emphasized ethical consciousness when generating the terms, ensuring the terms we generated and confirmed for addition to the schema would be valuable for alleviating the confusion of costume collection users.. background an accurate understanding of historical cultural eras allows historians / fashion experts to better make judgments about social values of the period. to achieve this understanding, it is important to consider the historical costuming conventions of the day. recognizing the importance of costuming artifacts to a proper cultural understanding, many universities, libraries, and museums have amassed large collections of historical and contemporary dress items. however, these pieces are frequently poorly described, with little to no interaction between collections on how to standardize piece descriptions. although an increased emphasis on the advantages associated with collection digitization (zeng 1999) has started to appear in recent years, there is a dearth of research in this area regarding standardization of piece description. metadata expansion efforts can be found across a variety of fields, though relatively few use nlp to improve development speed. specifically in the fashion domain, several pushes have been made towards a unified ontology through metadata expansion. the kent state university collection that utilized dublin core was one such effort (zeng 1999). additionally, costume core (kirkland 2018), upon which our metadata schema is built, serves as an effective groundwork for metadata expansions. both of these schemas, however, https://doi.org/10.7191/jeslib.845 https://liberalarts.vt.edu/departments-and-schools/apparel-housing-and-resource-management/experience/collections/the-oris-glisson-historic-costume-and-textile-collection.html journal of escience librarianship 13 (1): e845 | https://doi.org/10.7191/jeslib.845 e845/4 suffer from a lack of granularity due to fewer metadata levels or controlled terms than required for accurate cataloging. in (valentino 2017), another such linked metadata schema is presented. ryerson university has also made efforts to more clearly display their fashion collection (eichenlaub, morgan, and masak-mida 2014) using dublin core. a crowdsourcing effort based on costume core uses survey data as a promising tool to combat the lack of generality present in many of the above schemas (cai et al. 2012), but their work is specific to chinese-style costumes. numerous efforts have been made towards ontology development as well, both in the fashion domain and elsewhere. a work in the fashion domain (bollacker, díaz-rodríguez, and li 2016) claims that ontologies taking only garment attributes into account provide insufficient information and seeks to build a subjective influence network to incorporate more data into the ontology. in novalija and leban 2013, work is done to construct an ontology of designer garments, connecting pieces based on wikipedia link structures. the primary benefits of a more comprehensive metadata schema include improvement of searchability and discoverability. streamlining the expansion of metadata schema can be efficiently achieved by employing nlp techniques. outfit2vec (jaradat, dokoohaki, and matskin 2020) uses clothing metadata to build machine learning models that can better recommend garments to consumers. tahsin in (tahsin et al. 2016) uses nlp to extract geographic metadata from text corpuses to increase location specificity. one approach towards solving this problem, taken by (cai et al. 2012), is crowdsourcing—researchers use nlp techniques coupled with input from 100 students with regards to metadata element importance. however, the techniques used in the above paper did not generate new descriptors, only assisted in confirming previously selected categorizations. a standardized set of descriptors needs to be developed in order to allow visitors to digital collections to quickly search for a particular type of garment. such a set of descriptors is costume core, as mentioned above. unfortunately, the scope of costume core is limited by its size – many potentially useful descriptors and several valuable categories are left out, restricting the utility of the project. in our work, we seek to expand upon the costume core vocabulary by using nlp techniques to efficiently identify new descriptors previously not included in the schema, expanding the size of the costume core vocabulary to enhance digital collection cataloging and search capabilities. methodology our process to identify high-quality costume descriptors consisted of multiple steps. firstly, we generated hundreds of potential descriptors using word embeddings, an nlp technique, from the initial costume core schema. afterwards, we used our model output confirmative helper application (mocha) to facilitate the review process by our trained fashion students. finally, our domain expert reviewed all selections, trimming the choices down to ensure quality. https://doi.org/10.7191/jeslib.845 e845/5 journal of escience librarianship 13 (1): e845 | https://doi.org/10.7191/jeslib.845 to obtain the initial descriptors used in similarity generation, we adapted a popular set of descriptors found in the costume core vocabulary commonly used in fashion metadata description tasks (kirkland 2018). while the keywords contained in the vocabulary were acceptable in many cases, some categories could be removed, as new, meaningful descriptors were unlikely to be generated for the category. one such example is the “socio-economic class” category, which, in the original costume core vocabulary, contains the descriptors “middle class,” “upper class,” and “working class.” as models are unlikely to create useful descriptors for this category, this category and similar others were removed from our analysis. in addition, slight manual lemmatizations (such as changing “coatdresses” to “coatdress”) were made to generate more accurate predictions. data preprocessing additional data pre-processing was necessary to convert keywords to a format usable by our selected models. as the keywords were initially stored in an excel file, we needed to convert this to a file more conducive to model format. to accomplish this, we removed characters our models wouldn’t recognize, such as “$” and “!,” from the file using a regular expression. additionally, we performed some minor manual tweaking of the initial keyword selections to maximize the number of potential new descriptors output by our model. model selection similar descriptors can be generated efficiently using a cosine similarity method, a technique that uses distance between vector representations of tokens, also known as word embeddings, as a measure of similarity between words. gensim’s word2vec, a python library developed to work with word embeddings, provides functionality to quickly and easily generate the most similar words for a model from a vocabulary. for accurate comparison testing, we tested three separate gensim word2vec models, specifically the google news, mpnet, and sentence-t5 models. while these models were not specifically fine-tuned on fashion literature, they were still capable of accurately modeling the relationships between descriptors, as seen in figure 2. after initial data analysis and feedback from our reviewers, we elected to narrow down our focus to emphasize solely the google news model. initial costume core network data visualization in order to gain an understanding of the different connections between costume core keywords as captured by the models, we create visualizations of relationships between keywords. to create these visualizations, we first load in the costume core keywords, organized by category, as well as our three separate models. for our model, we iterate over model keywords twice, calculating the cosine similarity values between keywords and storing the values greater than a set percentage in a tabular format (figure 1). https://doi.org/10.7191/jeslib.845 journal of escience librarianship 13 (1): e845 | https://doi.org/10.7191/jeslib.845 e845/6 we then use the python networkx (hagberg, swart, and schult 2008) library to export a graph as a format readable by the orange (demsˇar et al. 2013) software. in addition, we create separate files to specify orange visualization format by providing additional keyword category information. to assess the quality of the model’s representations, we graphed our initial descriptors by category, connecting them by the strength of their cosine similarity weights. as seen in figure 2, the model performs reasonably well at clustering similar costume core keywords, as evidenced by the tightly clustered “color” and “material” categories. figure 1: cosine similarity weights between the costume core keywords (keyword_1) and their most similar tokens (keyword_2) as predicted by the model (weight). figure 2: google news representation of costume core keywords. colors represent categories of descriptors, e.g. material, neckline, technique, etc. https://doi.org/10.7191/jeslib.845 e845/7 journal of escience librarianship 13 (1): e845 | https://doi.org/10.7191/jeslib.845 descriptor generation once we had our descriptors and models loaded into the correct format, we ran the descriptors through our word2vec models to generate new potential descriptors for later analysis. for each valid costume core descriptor, we generated the top 25 most similar potential descriptors. after the models finished generating similar potential descriptors, we saved the results to separate .csv files for storage later—these similar potential descriptors contained important information for analysis. supporting human in the loop term selection before the potential descriptors generated by the model could be released, we needed a way to confirm the descriptors were actually valid descriptors for historical costuming metadata. we determined that the most effective way of confirming these descriptors was a human-in-the-loop approach, in which fashion metadata experts would check over the model selected descriptors, selecting the most accurate / relevant descriptors. this approach is partially motivated by the fact that their selections would allow us to calculate several statistics measuring the accuracy of the models, definitively demonstrating the effectiveness of nlp models in generating new descriptors in the fashion metadata domain. our process is displayed in more detail in figure 3. figure 3: descriptor confirmation loop using a model output confirmative helper application (mocha). to expedite the process and allow our domain experts to easily and efficiently process the generated descriptors, we created a model output confirmative helper application (mocha) to present the model-generated descriptors. to operate the web application, users load in model-generated words, at which point, they can visually select a subset of descriptors to classify as confirmed descriptors. these model-generated words came in the form of the top 25 descriptors similar to each token in the costume core vocabulary. multiple usability enhancements allow users to navigate quickly between pages of descriptors, in case labeling needs to be broken up into multiple sessions. functionality for clearing selected descriptors is added to allow for greater flexibility. after all descriptors have been confirmed / rejected, or after a labeling session has ended, users can download the descriptors they’ve confirmed for analysis. https://doi.org/10.7191/jeslib.845 journal of escience librarianship 13 (1): e845 | https://doi.org/10.7191/jeslib.845 e845/8 to confirm our selections, we first had a group of three trained fashion students confirm model-generated descriptors using the web application, as seen above. to use the web application, descriptors were loaded in as textual data, where they appeared as a word cloud in column 2. after clicking on terms to select beneficial ones, they would appear in column 3 and could be downloaded for analysis. terms which remained unselected were placed in column 1, where they would be discarded. after our students had finalized reviewing potential descriptors, our domain expert edited / revised the students’ selections. after finalizing revision, we converted the collected confirmed descriptors to a form more suitable for analysis, combining the descriptors generated by the domain experts into a single file. due to input from our domain expert / students on the relevance of the generated descriptors produced by the three models, we decided to proceed with the google news model, as the descriptors generated were found to be more relevant to the domain-specific task. as a result, analyses presented in the following section were obtained from descriptors generated by the google news word2vec model. results as mentioned above, the domain experts processed the top 25 similar words for each word in the costume core vocabulary. we also created files of the top 20, 15, 10, and 5 most similar words, as generated by the models. below are the plotted graphs of cosine similarity score (x-axis) versus the percentage of words having that cosine similarity score (y-axis), for both model-generated and reviewer-confirmed tokens. as expected, the two graphs in figure 5 show that the top 5 most similar words generated by the models have higher cosine similarity values on average than the top 25 most similar words. figure 4: mocha application. column 1 stores descriptors not selected by our reviewers, column 2 stores descriptors currently being processed, and column 3 stores descriptors confirmed by our reviewers. https://doi.org/10.7191/jeslib.845 e845/9 journal of escience librarianship 13 (1): e845 | https://doi.org/10.7191/jeslib.845 however, the measure of the model’s efficacy in predicting descriptors is displayed in the gap between the cosine similarity scores of the confirmed descriptors and the overall generated descriptors. if the model’s predictions are accurate, we would expect words with higher cosine similarity scores to have a larger chance of being confirmed by our domain experts. figure 5: comparison of top n overall terms (left) with top n confirmed terms (right), plotting cosine similarity score against the percentage of descriptors with the same score. figure 5: comparison of top n overall terms (left) with top n confirmed terms (right), plotting cosine similarity score against the percentage of descriptors with the same score. as seen from figure 6, there is a clear distinction between the original, model-generated descriptors and the descriptors that were actually confirmed. to further demonstrate this difference, we show the relative averages of confirmed and overall descriptors in table 1. consistently, the confirmed descriptors had a higher cosine similarity score on average than the overall model-generated descriptors. in table 1, we see that consistently, across multiple different groups of generated vs confirmed terms, the confirmed terms had higher cosine similarity scores on average. this gap between generated and confirmed terms leads us to believe that our model was effective at generating high-quality descriptors, as descriptors which the model deemed more likely to be included in the schema (higher cosine similarity score) were indeed selected more frequently on average, indicating that the model performed well here. https://doi.org/10.7191/jeslib.845 journal of escience librarianship 13 (1): e845 | https://doi.org/10.7191/jeslib.845 e845/10 table 1: relative averages of confirmed and overall descriptors. top 25 top 20 top 15 top 10 top 5 confirmed cs score 0.6063 0.6145 0.6244 0.6329 0.6575 overall cs score 0.5688 0.5777 0.5901 0.6071 0.6370 hit rate 14.2% 15.6% 17.8% 21.3% 27.5% additional statistics on percentages of term cosine similarity scores hit rate = % of generated descriptors that were confirmed. ethical considerations the main objective of this work was to minimize the confusion felt by many costume collection users by providing an expanded set of metadata descriptors for use in cataloging efforts. an important priority for us was to ensure that our generated descriptors were diverse enough to encompass a wide array of periods and cultures, in order to avoid discriminatory exclusions of garment types. in pursuit of this goal, we used models trained on a wide variety of different sources to generate our descriptors, ensuring that these models would have been exposed to data from many different areas, hopefully alleviating many of these potential concerns. in addition, our human-in-the-loop approach allowed us to exclude potentially negative or harmful terms from being added to our schema by providing an additional layer of protection. a point that may be considered is the impact of the human-in-the-loop approach to term biases—did the fact that we had one final reviewer potentially cause an opportunity for bias to be created? to combat this, our initial term selections were made by separate reviewers, so that our fashion metadata expert only made final confirmation decisions on descriptors judged to be valuable from a variety of sources. additionally, our fashion metadata expert has extensive experience with different terminologies and is knowledgeable on best practices in the field, two characteristics which help to ensure the quality of our schema. these measures, taken on both the generation and filtration ends of the process, serve to minimize the risk of potentially harmful terms being added to the schema. however, despite these precautions, it may still be valuable to consolidate opinions from a variety of stakeholders, such as users, curators, and collection managers. who is affected by this project? our efforts to use nlp to select accurately controlled vocabularies can benefit any professional or institution that manages fashion collections, artifacts including private archives and museums owned by fashion companies such as michael kors and armani silos (franceschini 2019), regional historical societies/ museums, and university fashion study collections (green and reddy-best 2022) as it would allow them to select from controlled vocabularies that precisely describe and catalog artifacts. this project can also https://doi.org/10.7191/jeslib.845 e845/11 journal of escience librarianship 13 (1): e845 | https://doi.org/10.7191/jeslib.845 benefit digital librarians and other personnel collaborating with fashion domain professionals to create online digital libraries. because the use of nlp has led to the addition of accurate/sufficient metadata elements to costume core, which provides a means for structuring data (kirkland 2018), digital librarians can more easily map costume core vocabularies to those of pre-existing schemas such as dublin core when preparing to export metadata to online portals and aggregators. several university fashion collections have committed to using the costume core metadata schema to support two inter-institutional projects that aim to contribute to initiatives to standardize metadata across the historic dress and fashion domain (kirkland 2018). in addition, standardizing the metadata has implications for online users of digital fashion collections. without standardized metadata, online users may experience failed searches which limit the reach and accessibility of online fashion digital collections. thus, the benefits of nlp may extend to online users as it contributes to an initiative to standardize metadata within the fashion domain. lessons learned and future work while our process was fairly straightforward, there were a few issues we encountered along the way that, if not properly addressed, could have become stumbling blocks. one such area was our choice of model used to generate the terms. two models we originally attempted to use were deemed unsatisfactory for our use case due to the low quality of terms generated. however, trying a variety of models allowed us to select a model, google news, with excellent representations of our space. the model outputs and processing code are currently being prepared for dissemination, but the categorization tool code (mocha) is available. another issue we encountered was that of sharing our web application. bundling up the tool and sending the files via a messaging service seemed likely to cause version control issues as well as potentially being difficult to set up for non-technical users. as a result, we found it efficient to set our application up on aws lightsail, a virtual server designed for running lighter-weight applications like mocha. this provided an easily accessible platform for our fashion students and domain experts to use while allowing us to perform minor updates easily, without needing to resend large files after every update. as for future work, we would like to create a visual thesaurus tool with costume core metadata (old and new) to help catalogers choose the most accurate term(s). also, because (kirkland et al. 2023) found that users searched for holdings on historical collection websites using retail or lay terminology, future work will / may include reviewing fashion lay and retail terms, comparing them with other established controlled vocabularies, including the international council of museums (icom) vocabulary of basic terms for cataloguing costume and getty art and architecture thesaurus, and adding them to costume core. https://doi.org/10.7191/jeslib.845 https://github.com/calebmcirvingithub/mocha/ journal of escience librarianship 13 (1): e845 | https://doi.org/10.7191/jeslib.845 e845/12 data availability the data presented in this study are available on request from the corresponding author. acknowledgements this project was made possible by an internal grant from the virginia tech university libraries. the research case study was developed as part of an imls-funded responsible ai project, through grant number lg-252307-ols-22. competing interests the authors declare that they have no competing interests. references bollacker, kurt, natalia díaz-rodríguez, and xian li. 2016. “beyond clothing ontologies: modeling fashion with subjective influence networks.” paper presented at knowledge discovery and data mining: machine learning meets fashion: data, algorithms and analytics for the fashion industry, san francisco, august 13-17. https://www.researchgate.net/publication/304762196_beyond_clothing_ontologies_ modeling_fashion_with_subjective_influence_networks. cai, yundong, yin-leng theng, qimeng cai, zhi ling, yangbo ou, and gladys theng. 2012. “crowdsourcing metadata schema generation for chinese-style costume digital library.” in the outreach of digital libraries: a globalized resource network, edited by hsin-hsi chen and gobinda chowdhury, 97–105. lecture notes in computer science. berlin, heidelberg: springer. https://doi.org/10.1007/978-3-642-34752-8_13. demsˇar, janez, tomazˇ curk, alesˇ erjavec, janez demsar, tomaz curk, ales erjave, crt gorup, et al. 2013. “orange: data mining toolbox in python.” in journal of machine learning research 14 (2013): 2349–2353. https://jmlr.org/papers/volume14/demsar13a/demsar13a.pdf. eichenlaub, naomi, marina morgan, and ingrid masak-mida. 2014. “undressing fashion metadata: ryerson university fashion research collection.” paper presented at international conference on dublin core and metadata applications, october, 191–195. https://citeseerx.ist.psu.edu/ document?repid=rep1&type=pdf&doi=4864f92bb4b5781847a5d3f2e3691d0a68c7e290. eicher, joanne b., and sandra lee evenson. 2014. the visible self: global perspectives on dress, culture and society. usa: bloomsbury publishing. https://www.isbns.net/isbn/9781609018702. franceschini, marta. 2019. “navigating fashion: on the role of digital fashion archives in the preservation, classification and dissemination of fashion heritage.” critical studies in fashion & beauty 10 (1): 69–90. https://doi.org/10.1386/csfb.10.1.69_1. green, denise nicole, and kelly l. reddy-best. 2022. “curatorial reflections in north american university fashion collections: challenging the canon.” critical studies in fashion & beauty 13 (1): 7–20. https://doi.org/10.1386/csfb_00035_2. hagberg, aric, pieter j. swart, and daniel a. schult. 2008. “exploring network structure, dynamics, and function using networkx.” la-ur-08-5495. los alamos national laboratory (lanl), los alamos, nm (united states). https://www.osti.gov/biblio/960616. https://doi.org/10.7191/jeslib.845 https://www.lib.montana.edu/responsible-ai/ https://www.imls.gov/grants/awarded/lg-252307-ols-22 https://www.researchgate.net/publication/304762196_beyond_clothing_ontologies_modeling_fashion_with_subjective_influence_networks https://www.researchgate.net/publication/304762196_beyond_clothing_ontologies_modeling_fashion_with_subjective_influence_networks https://doi.org/10.1007/978-3-642-34752-8_13 https://jmlr.org/papers/volume14/demsar13a/demsar13a.pdf https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=4864f92bb4b5781847a5d3f2e3691d0a68c7e290 https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=4864f92bb4b5781847a5d3f2e3691d0a68c7e290 https://www.isbns.net/isbn/9781609018702/ https://doi.org/10.1386/csfb.10.1.69_1 https://doi.org/10.1386/csfb_00035_2 https://www.osti.gov/biblio/960616 e845/13 journal of escience librarianship 13 (1): e845 | https://doi.org/10.7191/jeslib.845 jaradat, shatha, nima dokoohaki, and mihhail matskin. 2020. “outfit2vec: incorporating clothing hierarchical metadata into outfits’ recommendation.” in fashion recommender systems, edited by nima dokoohaki, 87–107. lecture notes in social networks. cham: springer international publishing. https://doi.org/10.1007/978-3-030-55218-3_5. kirkland, arden. 2018. “costume core: metadata for historic clothing.” visual resources association bulletin 45 (2): 6. https://online.vraweb.org/index.php/vrab/article/view/36. kirkland, arden, monica sklar, clare sauro, leon wiebers, sara idacavage, and julia mun. 2023. “‘i’m not searching the right words’: user experience searching historic clothing collection websites.” the international journal of the inclusive museum 16 (1): 119–146. https://doi.org/10.18848/1835-2014/cgp/v16i01/119-146. novalija, inna, and gregor leban. 2013. “applying nlp for building domain ontology: fashion collection.” paper presented at conference on data mining and data warehouses (sikdd). https://ailab.ijs.si/dunja/sikdd2013/papers/novalija-fashioncollection.pdf. tahsin, tasnia, davy weissenbacher, robert rivera, rachel beard, mari firago, garrick wallstrom, matthew scotch, and graciela gonzalez. 2016. “a high-precision rule-based extraction system for expanding geospatial metadata in genbank records.” journal of the american medical informatics association 23 (5): 934–941. https://doi.org/10.1093/jamia/ocv172. valentino, maura. 2017. “linked data metadata for digital clothing collections.” journal of web librarianship 11 (3–4): 231–240. https://doi.org/10.1080/19322909.2017.1359135. zeng, marcia lei. 1999. “metadata elements for object description and representation: a case report from a digitized historical fashion collection project.” journal of the american society for information science 50 (13): 1193–1208. https://doi.org/10.1002/(sici)1097-4571(1999)50:13<1193::aid-asi5>3.0.co;2-c. https://doi.org/10.7191/jeslib.845 https://doi.org/10.1007/978-3-030-55218-3_5 https://online.vraweb.org/index.php/vrab/article/view/36 https://doi.org/10.18848/1835-2014/cgp/v16i01/119-146 https://ailab.ijs.si/dunja/sikdd2013/papers/novalija-fashioncollection.pdf https://doi.org/10.1093/jamia/ocv172 https://doi.org/10.1080/19322909.2017.1359135 https://doi.org/10.1002/(sici)1097-4571(1999)50:13<1193::aid-asi5>3.0.co;2-c using ai/machine learning to extract data from japanese american confinement records journal of escience librarianship 13 (1): e850 doi: https://doi.org/10.7191/jeslib.850 issn 2161-3974 full-length paper using ai/machine learning to extract data from japanese american confinement records mary elings, university of california berkeley, berkeley, ca, usa, melings@berkeley.edu marissa friedman, university of california berkeley, berkeley, ca, usa vijay singh, doxie.ai, san jose, ca, usa abstract purpose: this paper examines the use of artificial intelligence/machine learning to extract a more comprehensive data set from a structured “standardized” form used to document japanese american incarcerees during world war ii. setting/participants/resources: the bancroft library partnered with densho, a community memory organization, and doxie.ai to complete this work. brief description: the project digitized the complete set of form wra-26 “individual records’’ for more than 110,000 japanese americans incarcerated in war relocation authority camps during wwii. the library utilized ai/machine learning to automate text extraction from over 220,000 images of a structured “standardized” form; our goal was to improve upon and collect information not previously recorded in the japanese american internee data file held by the national archives and records administration. the project team worked with technical, academic, legal, and community partners to address ethical and logistical issues raised by the data extraction process, and to assess appropriate access options for the dataset(s) and digitized records. received: november 15, 2023 accepted: february 5, 2024 published: march 6, 2024 keywords: libraries, artificial intelligence, ai, machine learning, archives, japanese american incarceration, world war ii citation: elings, mary, marissa friedman, and vijay singh. “using ai/machine learning to extract data from japanese american confinement records.” journal of escience librarianship 13 (1): e850. https://doi.org/10.7191/jeslib.850. the journal of escience librarianship is a peer-reviewed open access journal. © 2024 the author(s). this is an open-access article distributed under the terms of the creative commons attribution-noncommercial-sharealike 4.0 international license (cc by-nc-sa 4.0), which permits unrestricted use, distribution, and reproduction in any medium for non-commercial purposes, provided the original author and source are credited, and new creations are licensed under the identical terms. see https://creativecommons.org/licenses/by-nc-sa/4.0. open access https://doi.org/10.7191/jeslib.850 mailto:melings%40berkeley.edu?subject= https://doi.org/10.7191/jeslib.850 https://creativecommons.org/licenses/by-nc-sa/4.0 https://orcid.org/0000-0003-3119-4648 journal of escience librarianship 13 (1): e850 | https://doi.org/10.7191/jeslib.850 e850/2 abstract continued results/outcome: using ai/machine learning increased the quality of the data extracted from the digitized wwii era forms. evaluation method: a comparison of the earlier dataset extracted from the 1940s’s computer punch cards to the current data set extracted using ai/machine learning, the use of ai/machine learning showed marked improvement. project description with funding from a 2019 national park service japanese american confinement sites grant, the bancroft library digitized the complete set of form wra-26 “individual records’’ for more than 110,000 japanese americans incarcerated in war relocation authority camps during wwii. the library partnered with doxie.ai to utilize ai/machine learning to automate text extraction from over 220,000 images; our goal was to improve upon and collect information not previously recorded in the japanese american internee data file held by the national archives and records administration (nara). the project team worked with technical, academic, legal, and community partners to address ethical and logistical issues raised by the data extraction process, and to assess appropriate access options for the dataset(s) and digitized records. overview this project offered our library the first opportunity to use ai/machine learning to improve data extraction from a digitized historical resource. our goal was to enhance access to the information held within that resource and ultimately support emerging scholarship and computational analysis. because the expertise did not exist in our library, we partnered with a team of data scientists. their role was to develop a custom machine learning pipeline for the data extraction. our role was to facilitate that work, provide guidance and content expertise to the data scientists, and review/quality control (qc) the results. narrative summary with funding from a 2019 national park service japanese american confinement sites grant, the bancroft library digitized the complete set of form wra-26 “individual records’’ for more than 110,000 japanese americans incarcerated in war relocation authority camps during wwii. the library partnered with the data scientists at doxie.ai to utilize ai/machine learning to automate text extraction from over 220,000 images; our goal was to improve upon and collect information not previously recorded in the japanese american internee data file held by the national archives and records administration (nara) (national archives 2024). the project team worked with technical, academic, legal, and community partners to https://doi.org/10.7191/jeslib.850 e850/3 journal of escience librarianship 13 (1): e850 | https://doi.org/10.7191/jeslib.850 address ethical and logistical issues raised by the data extraction process, and to assess appropriate access options for the dataset(s) and digitized records. project details in 2019, the bancroft library at the university of california, berkeley, received funding through the national park service japanese american confinement sites grant program to digitize materials from the japanese american evacuation and resettlement records (banc mss 67/14 c) collection, specifically the complete set of form wra-26 “individual records” for more than 110,000 japanese americans incarcerated in war relocation authority (wra) camps during world war ii (bancroft 2019). the five main goals of this project included: 1. digitizing and creating a preservation copy of the form wra-26 records for future generations, as these records are of enduring and significant historical value; 2. reaching consensus among community representatives and stakeholders as to how best to provide access to the form 26 material; 3. providing a new, more complete dataset relating to japanese american wwii incarcerees which improves upon the errors, gaps, and omissions in the existing data file which was generated from computer punch cards created during wwii; 4. testing, creating, and implementing workflows and tools which can help the bancroft library transform its growing digital archival collections into data that can be made available for computational analysis and enhanced access; 5. building, testing, and implementing a sustainable model for integrating community input into our work in alignment with our own responsible access workflows (uc regents 2021). the project was led by principal investigator (pi) mary elings, interim deputy director and head of technical services, and managed by digital project archivist marissa friedman of the bancroft library. the library contracted with backstage library works to digitize the original forms and with doxie.ai to implement ai/machine learning to extract data from the digitized forms. the pi and digital project archivist worked closely with densho, a community memory organization dedicated to preserving the history of the incarceration, and the uc berkeley office of scholarly communication services, to organize a community advisory group meeting (densho 2024). the goal of this meeting was to bring together community experts to advise on how to ethically and responsibly expand access to these sensitive records. to understand why these collections were selected as the basis for this project, it is helpful to understand a bit of the historical context and lifespan of both the analog records and the data these records contain. from 1942 to april 1943, a census-type two-page form wra-26 (or “individual record”) was used to collect a wide range of demographic, educational, occupational, and biographical data about every japanese american incarcerated in wra camps during the war (figure 1). https://doi.org/10.7191/jeslib.850 journal of escience librarianship 13 (1): e850 | https://doi.org/10.7191/jeslib.850 e850/4 figure 1: blank form wra-26 “individual records” (front and back). there were several variations of this form under the same name with slight changes to location and number of fields. courtesy the bancroft library. figure 2: war relocation authority computer punch card. courtesy densho. https://doi.org/10.7191/jeslib.850 e850/5 journal of escience librarianship 13 (1): e850 | https://doi.org/10.7191/jeslib.850 the information included some potentially sensitive personal data and was taken under duress and without consent from forcibly relocated and incarcerated individuals. during the war, data from the form wra-26 records was coded by incarcerated japanese americans and other wra office staffers to early computer punch cards so that the information, some of it generalized into broad categories, could be processed by tabulating machines (figure 2). at the conclusion of the war, a copy of the form wra-26 punch cards and the original typeor handwritten forms from which the punch cards were coded were deposited at the bancroft library along with many other wra records. in the 1960s, the form wra-26 data on the computer punch cards was transferred onto magnetic tape by the library with help from the nascent uc berkeley computer science department. the office of redress administration (ora) acquired a copy of the data from the bancroft library in 1988 to aid in disbursing reparations to former japanese american incarcerees. upon completion of the agency’s work, the modified file was transferred to the national archives. nara published the data file it acquired from the ora, along with extensive documentation, as part of its access to archival databases (aad) project in 2003. referred to as the japanese american internee data file, this datafile currently serves as an authoritative resource for genealogical information for former inmates and their family members, as well as statistical information about the incarcerated population as a whole. thanks to densho, we were made aware that the more than 110,000 form wra-26 records held at the bancroft were possibly the only remaining complete set, organized by camp, in existence. while digitization for preservation then became the immediate concern for these records of unique and enduring research value, in the spirit of the archives and collections as data movements, the library also wanted to explore how the information in these records might be made available for computational research. digitizing the entire corpus of form wra-26 records provided an opportunity to extract data from these records and create a new dataset which might rectify the gaps, omissions, and errors that are present in the existing datafile at nara due to how the data was originally created. in order to accomplish this, we needed to transcribe and extract an enormous quantity of data from over 220,000 images, an undertaking which was not feasible given current staffing, expertise, and resource levels. we recognized that we would need to add team members to the project with the technical expertise and experience to help us use machine learning to pull data from these records. library staff first met the team members of doxie.ai while they were graduate students in the master’s of information and data science program at uc berkeley’s school of information. the team of data scientists had done a previous project, bugtrap: (bug transcription and annotation pipeline), that extracted data from images of labels on entomological specimens (doolittle, et al. 2020). this work demonstrated their interest and experience in developing customized machine learning pipelines to transcribe text from images. because we lacked in-house expertise, we discussed the project and ethical concerns posed by working with these records with the team of data scientists and their faculty advisor. satisfied that this approach would yield the results we https://doi.org/10.7191/jeslib.850 journal of escience librarianship 13 (1): e850 | https://doi.org/10.7191/jeslib.850 e850/6 hoped for (efficient and high quality text transcription from complex resources), the library partnered with these ucb-affiliated data scientists to automate the text extraction process using ai and machine learning. from an ethical perspective, we worked closely with the team to secure the data and address any ethical or sensitivity concerns in the data. our collaboration with them spanned nearly two years, during which they transitioned from graduate students in uc berkeley’s mids program to an external vendor following their incorporation as doxie.ai (doxie.ai 2024). working with the doxie.ai team, we learned a great deal about what was possible, and impossible, is using ai/machine learning for improved text extraction, as compared to ocr or hand transcription. at the initial stage of the project, we also consulted with experienced data publication colleagues in the library to determine available and appropriate methods for data transfer, storage, and project management tracking. given the sensitivity of the data, we used box to more securely transfer digital files from the library to doxie.ai team members, and created a private github repository to securely store the extracted data—in csv and json formats—and documentation. with a basic workflow for the transcription process in place, doxie.ai began testing and developing a customized machine learning model for the materials which could be iteratively improved upon as we progressed through the records, moving from camp to camp. all models were supervised; reinforcement learning (rl) or unsupervised learning was not used for this project. doxie employed model fine-tuning to achieve the best results possible given the data drift introduced by variances in the type, spacing, and placement of data from camp to camp. this variance was monitored as the project progressed, and additional data was used to fine-tune the models for a more robust performance. this process involved quite a bit of knowledge sharing between doxie.ai and bancroft staff, as the archivist managing the project noted key characteristics of the records and relayed what content we hoped to transcribe; meanwhile, doxie.ai defined the parameters of what was technologically possible and continually worked to expand the pipeline’s capacity and accuracy to match new observations discovered about the form and content of the records. due to variability in structure, content, and other unique characteristics of the physical records within and across camps, the team adopted an iterative approach which handled sets of form wra-26 records one camp at a time. after a record set from one wra camp was run through the pipeline, results were reviewed by bancroft staff, and doxie.ai attempted to integrate any feedback into future work. a number of challenges arose throughout the process. for example, the individual records organized by camp turned out to contain a significant number of forms with entirely handwritten responses as well as a numerous forms with handwritten notations, annotations, and corrections, all of which were not able to be fully or accurately captured in the machine learning transcription process due to limitations with the technology when applied to handwritten materials. additionally, six or seven versions of the form were used, with discrepancies in data types, content, and spacing of text on the page disrupted the ability of the ml pipeline to accurately transcribe content. many documents also included stamps, handwritten corrections, strikethroughs, notes, and other marginalia, which presented visual noise that was difficult https://doi.org/10.7191/jeslib.850 e850/7 journal of escience librarianship 13 (1): e850 | https://doi.org/10.7191/jeslib.850 for automated transcription models to handle. there are limitations to even a finely tuned and customized machine learning pipeline when applied to the complexity and inconsistency of even fielded, form-based data in archival documents. beyond the challenges of implementing ai/machine learning technology, the project presented a number of ethical issues due to the potentially sensitive nature of the records being digitized, records which contain pii and other information about vulnerable individuals. the project team worked with library and community partners to address ethical and logistical issues raised by the data extraction process, and to assess appropriate access options for the dataset(s) and digitized records. a critical component of this work was forming a community advisory group which met virtually in august 2022; the goal of this meeting was to garner meaningful community feedback on the work accomplished so far, and work towards some sort of consensus on whether and how to provide access to the digitized forms and dataset. background this project presented an interesting opportunity to apply ai and machine learning tools to our digital collections. first, the highly structured and (we thought) consistent format of the materials being digitized lent themselves to applying an automated transcription approach. the form-based records included primarily structured and typewritten data; while transcription of historical handwritten texts is notoriously harder to achieve with the current set of tools available, machine learning tools for transcribing typewritten text are fairly advanced and have shown demonstrably good results. additionally, machine learning offered a reasonable solution to logistical constraints; the sheer size of the corpus of digitized materials in question (over 220,000 files) and current staffing and resource levels did not practically accommodate labor-intensive and costly manual transcription. adopting automated tools appeared to be a more efficient and less staff and cost-intensive approach for achieving a mass transcription project in a timely manner as compared to traditional hand transcription services or direct optical character recognition. we also considered public crowdsourcing tools and digital platforms such as zooniverse and from the page, but decided against these options due to a few reasons. first, the sensitive nature of the records potentially precluded mass public access, and second, the library was not at that time in a position to organize and manage a public crowdsourcing transcription project. the library looked to ai to help provide a significant lift in transcribing data from the digitized records to create a dataset which would improve upon and expand the existing japanese american internee datafile held by nara. as mentioned above, the existing data file available at nara, based on the 1940s punch cards, demonstrates a number of key limitations. the number of migrations the data went through over time introduced or compounded errors and inaccuracies from the original data collection process. even more importantly, a significant amount of detailed information collected in the original paper forms is missing from or was generalized in broad categories in the existing data file. in some instances, entire fields were absent from the nara dataset, including significant activities, skills, hobbies, educational and employment history, and https://doi.org/10.7191/jeslib.850 journal of escience librarianship 13 (1): e850 | https://doi.org/10.7191/jeslib.850 e850/8 the field for additional information. in other cases, the granularity of information was generalized through the act of coding responses to a pre-set number of categories or datapoints; for example, occupations were coded to a prescribed set of classifications, producing a loss of significant detail supplied in the original forms. we hoped that ai/machine learning could help us to recover as much of this information as possible, while transforming it into formats more readily accessible for computational research. inspiration for this project came from many sources. uc berkeley’s digital humanities program launched efforts in 2012, supporting computational “research ready” access to archival materials in a number of early pilot efforts that led the way to this project (berkeley center 2024). the concept of “research ready” data is when digitized corpora are machine readable, queryable, maintains original structure, annotated, and linked to other data on the same topic (adams 2017). the principles and documentation produced by participating in the always already computational: collections as data grant-funded initiative (2016-2018), which has been succeeded by the collections as data: part to whole initiative in 2018 also sparked this work (padilla 2019, 2023). our participation in the fantastic futures international conferences (ai4lam), starting in 2019, provided a network of colleagues and partners interested in applying ai/machine learning processes to library, archive, and museum collections. ethical considerations the university of california, berkeley, library released their responsible access workflows for digitization projects in 2020, which provides workflows covering four key law and policy areas relevant for digital collections: copyright, contracts, privacy, and ethics. of particular interest for this project was the ethics workflow, which prompts staff to consider whether unfettered digital (or analog) access could result in the harm or exploitation of people, resources, or knowledge. if the answer is yes or uncertain, then the workflow calls for: reference to professional and community standards, community engagement, and adapting local policies to better support ethical engagement. this workflow helped guide our risk assessment for the records being digitized and aligned well with our plan to form a community advisory group to consider ethical access to these materials. when writing the jacs grant proposal, we recognized that these records were potentially sensitive and would require additional evaluation prior to being released publicly. the context of the original data collection in wwii raised ethical concerns, as the information was taken under duress and without consent, and documented forcibly relocated and incarcerated individuals. additionally, the original forms contain sensitive information, including personally identifiable information (pii) such as social security numbers, as well as religious affiliations, health conditions, work history, family relationships, hobbies and personal interests. implementing an automated transcription tool was useful within the context of this project because it helped us resolve some of the ethical concerns posed by putting these sensitive documents on platforms such as https://doi.org/10.7191/jeslib.850 e850/9 journal of escience librarianship 13 (1): e850 | https://doi.org/10.7191/jeslib.850 zooniverse for a public transcription project. in the interest of maintaining the relative privacy and security of the materials during the ai implementation stage, we developed a workflow with doxie.ai to transfer digital images securely via box, with extracted data deposited in a private github repository for further evaluation and editing by library staff. doxie.ai customized their machine learning pipeline in ways which further aligned with privacy and ethical concerns; for example, we decided to automatically redact social security and alien registration numbers from the dataset so we would in no way collect or aggregate this data at any point in the project. it is worth noting here that none of the data produced from this project was used by doxie.ai towards their larger corpus of training data. ai models are often the topic of controversy because of bias that is sometimes inherent in the data on which they are trained. this bias is often seen in areas such as facial recognition and text generation. for this project, we did not have any such use cases. the primary use case was ocr and the only language was english, so we were not affected by the above mentioned biases. one aspect that did present itself was the ocr of japanese first and last names. ocr models often use general language understanding to boost their accuracy, and such data often consists of english names. doxie.ai developed special post processing of names and places using custom dictionaries in order to accurately transcribe this information. the library recognized that while these records were of tremendous research value, digitizing and providing access to them without adequate context and community input constituted poor stewardship. we worked closely with densho, our community memory organization partner, and with our ucb library scholarly communications team to host a community advisory group meeting in august 2022. our community advisors spanned a wide range of geographic and demographic backgrounds within the japanese american community, which included former internees and descendants, activists, artists, community historians, writers, and mental health professionals in collaboration with librarians and curators. while we intended to hold this meeting in-person, after several covid-19 surge-related cancellations, we decided to meet virtually to protect the health and safety of all participants. the project team’s investment in creating a number of resources to inform participants of the project context and goals, as well as the library’s ethics workflows and access policies, led to a very successful virtual community engagement. abundant documentation was produced by bancroft staff, ucb library colleagues in the scholarly communications office, and our community partner organization densho to guide the day-long series of smalland large-group discussions. throughout the meeting, participants were asked to weigh the benefits of public access to the data with the risks of potential harm to community members. during the meeting, a general consensus emerged from the advisory group that the research and community value of the information in these materials outweighed the potential harms of opening the records for research, but that any release should be accompanied by adequate context and description to explain those circumstances under which it was gathered. some logistical questions remain unsolved. for example, whether and how to introduce a more layered, multi-level approach to access was not decided, https://doi.org/10.7191/jeslib.850 journal of escience librarianship 13 (1): e850 | https://doi.org/10.7191/jeslib.850 e850/10 and is dependent in part on staffing capacity and the available technical mechanisms supported by the it infrastructure of the broader ucb library. further follow up with library stakeholders and key decisionmakers is needed before we can propose and implement an ethical access plan for both the digitized records and dataset. until a final decision is made, both sets of resources will remain restricted from public access. who is affected by this project? many people and groups are and will be affected by this project. first, and most importantly, these archival records are not openly available to community members, so if and when we are able to make the digitized forms and dataset available, it will have a huge impact on the communities affected by the forced incarceration of japanese americans during wwii. many have never seen the forms or been able to analyze the data that was omitted or generalized in the wwii-era punch cards. the details of work histories, education, hobbies, and other particulars of people’s lives will finally be available to their families. secondly, these records will provide important data to researchers looking at historical patterns represented in this massive demographic dataset. we hope this new information will provide new insights and, in combination with other datasets such as densho’s name registry and the final accountability roster (far) records, provide a bigger picture of the impacts of this period in history (densho “name registry” and “final accountability roster” 2024). on the library level, we plan to provide access to this data as a collection, help users find and access it, and, as much as possible, support computational services around this data. this will largely impact our it and research support services groups, who need to provide the infrastructure and tools to support these services, and as well as the experts in digital scholarship service areas who will support users. lessons learned and future work success in this project came from leveraging partnerships to combine technical expertise with content and domain expertise, as well as community knowledge. responsible implementation of ai in this context relied upon these different knowledge communities to collaboratively develop a machine learning pipeline informed by considerations of privacy and ethics, and to apply an ethical framework for co-curation of the various digital resources produced by the project. ethical implementation of ai should be iterative and collaborative, guided by clear policies and ethical frameworks, and informed by community engagement and input. we learned that there is no one-size-fits-all approach to applying ai responsibly in an archival context. this is in large part related to the unique characteristics and contexts of discrete archival collections. not all material within an archival repository lends itself readily to these tools in their current state of development, and selecting material that would benefit most from this process, especially given staffing or resource constraints, depends on a variety of factors. when adopting ai tools for mass transcription or data extraction, we have found it important to consider the structure and content of the material (i.e. does it already contain structured data), the size or quantity of materials in the collection, the consistency of the https://doi.org/10.7191/jeslib.850 e850/11 journal of escience librarianship 13 (1): e850 | https://doi.org/10.7191/jeslib.850 physical records, and the risk of harm posed to record creators or subjects in making particular collections available online and accessible for computational research. the bancroft library is still exploring the extent of our role in creating and providing access to “researchready” data (i.e. data that is machine readable, queryable, maintains original structure, annotated, and linked to other data on the same topic data), and how ai/machine learning can be leveraged to help us do this work efficiently and economically. more resources are needed to simply digitize our collections and the additional cost of extracting research ready data strains already tight resources. is creating clean and usable data that does not require significant additional work by the researcher enough? this project provided one tangible case study for how we might implement this in the future for similar materials but we need to improve cost models, potentially by partnering with researchers and community members to improve upon the machine-generated datasets. in this project, the ai/machine learning costs added over 50% on top of our usual digitization costs, and that is a big lift going forward as we still struggle to find funding for that work alone. our ethical work on this project is still not complete. we anticipate future work will include additional consultation with community members and other stakeholders in alignment with our ethical guidelines. technology has given us an opportunity to provide much needed information to individuals and families affected by the japanese american incarceration during world war ii. we want to provide that information as thoughtfully and ethically as possible, guided by our community partners and individuals who were impacted by that traumatic experience and by the ethical practices that are developing and taking shape in our field. documentation • 6th computational archival science (cas) workshop: using ai/machine learning to extract data from japanese american confinement records • university of california berkeley library: responsible access workflows • the bancroft library: jacs6 community advisory group meeting participant packet acknowledgements funding for this project was made possible, in part, by grants from the u.s. department of the interior, national park service, japanese american confinement sites grant program, and the henri and tomoye takahashi charitable foundation. the research case study was developed as part of an imls-funded responsible ai project, through grant number lg-252307-ols-22. competing interests the authors declare that they have no competing interests. https://doi.org/10.7191/jeslib.850 https://ai-collaboratory.net/wp-content/uploads/2021/11/4_friedman.pdf https://ai-collaboratory.net/wp-content/uploads/2021/11/4_friedman.pdf https://docs.google.com/presentation/d/1v66pgpiq9xqxxdvngpd3rkamoiw2hiyvvds4iv4vfom/edit?usp=sharing https://docs.google.com/document/d/1idg1oyl12gouisypblz7umsl4bu4b21vuc3jbq0e9b8/edit?usp=sharing https://www.nps.gov/jacs/ https://www.nps.gov/jacs/ https://www.takahashifoundation.org/ https://www.takahashifoundation.org/ https://www.lib.montana.edu/responsible-ai/ https://www.imls.gov/grants/awarded/lg-252307-ols-22 journal of escience librarianship 13 (1): e850 | https://doi.org/10.7191/jeslib.850 e850/12 references adams, nick. 2017. “from static archive to research-ready database.” library of congress: impact symposium, october 30, 2017. https://youtu.be/ap5nhcujava. bancroft library. 2019. “japanese american evacuation and resettlement records.” the online archive of california. 2019. https://oac.cdlib.org/findaid/ark:/13030/tf5j49n8kh. bancroft library. 2022. “jacs6 community advisory group meeting participant packet.” accessed november 10, 2023. https://docs.google.com/document/d/1idg1oyl12gouisypblz7umsl4bu4b21vu c3jbq0e9b8/edit?usp=sharing. berkeley center for interdisciplinary critical inquiry. n.d. “summer minor in digital humanities.” accessed february 21, 2024. https://cici.berkeley.edu/programs-and-initiatives/digital-humanities. collections as data facets. n.d. “#hackfsm.” accessed february 21, 2024. https://collectionsasdata.github.io/facet9. densho. n.d. “preserving japanese american stories of the past for the generations of tomorrow.” accessed february 21, 2024. https://densho.org. ————. n.d. “densho name registry.” accessed february 21, 2024. https://ddr.densho.org/names. ————. n.d. “final accountability roster.” accessed february 21, 2024. https://encyclopedia.densho.org/final_accountability_roster. doolittle, austin, cameron ford, vijay singh, and tracey tan. 2020. “bugtrap: (bug transcription and annotation pipeline).” uc berkeley school of information, mids capstone project fall 2020. https://www.ischool.berkeley.edu/projects/2020/bugtrap. doxie.ai. n.d. “doxie.ai.” accessed february 21, 2024. http://doxie.ai/#. friedman, marissa, cameron ford, mary elings, vijay singh, and tracey tan. 2021. “using ai/machine learning to extract data from japanese american confinement records.” in ieee bigdata’21 computational archival science: digital records in the age of big data workshop proceedings, virtual, december 17, 2021. https://doi.org/10.1109/bigdata52589.2021.9672076. national archives. n.d. “[japanese-american internee data file], 1942 – 1946.” access to archival databases (aad). accessed february 21, 2024. https://aad.archives.gov/aad/fielded-search.jsp?dt=3099&tf=f&cat=all. padilla, thomas, laurie allen, hannah frost, sara potvin, elizabeth russey roke, and stewart varner. 2019. “always already computational: collections as data: final report.” digitalcommons@university of nebraska – lincoln. https://digitalcommons.unl.edu/scholcom/181. padilla, thomas, hannah scates kettler, and yasmeen shorish. 2023. “collections as data: part to whole final report.” zenodo. https://doi.org/10.5281/zenodo.10161976. university of california regents. 2021. “berkeley library, university of california, responsibleaccessworkflows_public_cc-by-nc-4.0.” accessed november 10, 2023. https://docs. google.com/presentation/d/1v66pgpiq9xqxxdvngpd3rkamoiw2hiyvvds4iv4vfom/edit. https://doi.org/10.7191/jeslib.850 https://youtu.be/ap5nhcujava https://oac.cdlib.org/findaid/ark:/13030/tf5j49n8kh https://docs.google.com/document/d/1idg1oyl12gouisypblz7umsl4bu4b21vuc3jbq0e9b8/edit?usp=sharing https://docs.google.com/document/d/1idg1oyl12gouisypblz7umsl4bu4b21vuc3jbq0e9b8/edit?usp=sharing https://cici.berkeley.edu/programs-and-initiatives/digital-humanities https://collectionsasdata.github.io/facet9 https://densho.org https://ddr.densho.org/names https://encyclopedia.densho.org/final_accountability_roster https://www.ischool.berkeley.edu/projects/2020/bugtrap http://doxie.ai/# https://doi.org/10.1109/bigdata52589.2021.9672076 https://aad.archives.gov/aad/fielded-search.jsp?dt=3099&tf=f&cat=all https://digitalcommons.unl.edu/scholcom/181 https://doi.org/10.5281/zenodo.10161976 https://docs.google.com/presentation/d/1v66pgpiq9xqxxdvngpd3rkamoiw2hiyvvds4iv4vfom/edit https://docs.google.com/presentation/d/1v66pgpiq9xqxxdvngpd3rkamoiw2hiyvvds4iv4vfom/edit journal of escience librarianship 13 (1): e805 doi: https://doi.org/10.7191/jeslib.805 issn 2161-3974 full-length paper responsible ai at the vanderbilt television news archive: a case study clifford blake anderson, vanderbilt university, nashville, tn, usa, clifford.anderson@vanderbilt.edu jim duran, vanderbilt university, nashville, tn, usa abstract we provide an overview of the use of machine-learning and artificial intelligence at the vanderbilt television news archive (vtna). after surveying our major initiatives to date, which include the full transcription of the collection using a custom language model deployed on amazon web services (aws), we address some ethical considerations we encountered, including the possibility of staff downsizing and misidentification of individuals in news recordings. received: october 2, 2023 accepted: february 5, 2024 published: march 5, 2024 keywords: artificial intelligence, ai, audiovisual archives, television news, metadata automation citation: anderson, clifford blake and jim duran. 2024. “responsible ai at the vanderbilt television news archive: a case study.” journal of escience librarianship 13 (1): e805. https://doi.org/10.7191/jeslib.805. the journal of escience librarianship is a peer-reviewed open access journal. © 2024 the author(s). this is an open-access article distributed under the terms of the creative commons attribution-noncommercial 4.0 international (cc by-nc 4.0), which permits unrestricted use, distribution, and reproduction in any medium non-commercially, provided the original author and source are credited. see https://creativecommons.org/licenses/by-nc/4.0. open access https://doi.org/10.7191/jeslib.805 mailto:clifford.anderson%40vanderbilt.edu?subject= https://doi.org/10.7191/jeslib.805 https://creativecommons.org/licenses/by-nc/4.0 https://orcid.org/0000-0003-0328-0792 https://orcid.org/0009-0007-3299-976x journal of escience librarianship 13 (1): e805 | https://doi.org/10.7191/jeslib.805 e805/2 summary our goal is to increase the research value of the vanderbilt television news archive by making our collection computationally tractable. we began by using automated speech recognition (asr) to create transcripts and then applied ai tools to refine and enhance the textual dataset by organizing the collection and increasing the quality and quantity of its metadata. we are currently testing computer vision tools to extract information from the video stream accompanying the audio track of a recorded broadcast tv signal and to automate metadata production and discovery. project details founded in august 1968, the television news archive at vanderbilt university (vtna) numbers among the longest-running audiovisual archives for broadcast news in the world. over the course of its existence, the tv news archive has negotiated multiple technology shifts. in 2016, clifford anderson assumed responsibility for the direction of the tv news archive as part of his associate university librarian, or aul, portfolio at the university. he quickly recognized the need for upgrading several systems in light of newly available computational tools and newly emerging research requests. in 2018, jim duran was named the new director and pointed out another problem common to many archives: a growing backlog of undescribed materials. beyond the backlog, the search interface lacked an open api and did not allow for computational analysis, by then a frequent request from vtna’s researchers.1 we agreed that the limiting factor in both challenges was metadata. writing abstracts, the key metadata field in the vtna database, is a labor-intensive process that requires training, consistency, and persistence. the vtna adds 3.5 hours of evening news content every day, including weekends. at its peak, the vtna employed thirteen staff members but, by 2018, was down to five, only two of whom dedicated themselves full-time to writing abstracts. the vtna held about 3,000 episodes of news needing metadata before they could be made available to researchers. duran set out on an effort to find ways to automate either abstract creation or transcript generation. the primary goal was to eliminate the backlog and make the collection completely available to researchers. if the project could demonstrate how to speed up the acquisition, processing, and description of the collection, then he hoped to find a path toward increasing the daily recording capacity and expanding the vtna’s collection scope. at the same time, anderson laid down plans for the creation of a data lake at the university library. a data lake is effectively a repository of data sets that provides a computational environment, including on-demand 1 an api (application programming interface) connects different software systems and enables them to communicate and share data seamlessly. in this context, an api would allow researchers to write code on their computers that queries the vtna database in a secure and repeatable manner. from our perspective, an api would need to provide us with logs and reports on activity as well as a method for limiting use per user and session. https://doi.org/10.7191/jeslib.805 e805/3 journal of escience librarianship 13 (1): e805 | https://doi.org/10.7191/jeslib.805 cluster computing, to analyze those data. marcus weaver, a recently hired cloud engineer at the vtna, imported the tv news data set into the nascent data lake, allowing staff members to query and analyze the corpus as whole. in what follows, we outline the key steps that allowed us to transcribe, annotate, and render the vtna’s collection computationally tractable in a data lake. voice-to-text transcription project the goal was to make transcripts and captions available to patrons of the vtna as well as a data mining resource in the data lake. many longtime users of the vtna know the recordings unfortunately do not include captions—the text of spoken words used by people with hearing impairments, or by anyone preferring to read rather than listen to the news. in 2022, the vtna partnered with vanderbilt university library administration and the college of arts and science to utilize asr to generate transcripts and captions for 62,000 hours of recorded television news from august 1968 to june 2022. this project was partially funded by a budgetary surplus in the last quarter of fiscal year 2022-23. we had to complete the project in roughly four months before the start of the next budget cycle. vtna staff recognized the importance of captions and transcripts for a long time, but developments and improvements in automated tools made this project feasible. working with clifford anderson, duran devised a plan that took advantage of our existing usage of amazon web services (aws), which offered the ability to immediately begin transcribing with current operational funding. he created a workflow that used four python scripts running on three different computers to request 3,000 to 5,000 transcripts at once. when those transcripts finished, duran would order another batch until he completed 89,000 transcripts. the team decided on this approach, rather than using an asr vendor such as trint because there was no need for segmentation or human interaction with the transcripts. trint works best with a person interacting with each transcript, whereas the goal of this project was to automate the entire process to finish before the funding deadline. the aws transcribe service uses asr to generate transcripts of the audio tracks in the digital video files. out of the box, the service provides a proprietary language model based on commercial data, like call center recordings. duran wanted to increase the accuracy of the transcripts by applying a custom language model. amazon allows users to feed the transcribe service examples of text that resemble the desired output of a transcript job to improve the accuracy through machine learning. a sample needed to be 250,000 to 500,000 words of near-perfect text, also known as a training set. to get these near-perfect sample transcripts, we serendipitously discovered vanderbilt university had just licensed 3play media, a professional transcription service that employs transcriptionists for 99% accurate transcripts. susan grider, then the administrative manager of vtna, worked quickly to create a new https://doi.org/10.7191/jeslib.805 https://trint.com/ https://www.3playmedia.com/ journal of escience librarianship 13 (1): e805 | https://doi.org/10.7191/jeslib.805 e805/4 account for the vtna, and duran ordered approximately 75 transcripts for each american presidential administration from richard nixon to donald trump. he decided to split the collection by president because of the likely transition of frequent names in the news. regarding frequent names, 3play media allowed users to provide transcriptionists with notes on the spelling of tricky domain-specific terms. duran worked with dana currier, metadata specialist and student supervisor, to dig into the vtna’s existing database that included the proper spelling of names and places featured in the news. once they determined a strategy, duran wrote a new python script that found the top 600 names in the news for each presidential administration. that was formatted into a pdf and shared with the transcriptionists at 3play media for even better spelling accuracy. once the near-perfect transcripts were returned to the archive, duran compiled them into a training set for aws and created a custom language model. that model was then used on all news recordings for the specific date range. named entity recognition (ner) generating a title for a news segment was a more challenging endeavor in support of which we once again turned to ai tools for help. each news story and commercial required a title that matched the existing title style, which roughly followed the pattern location / main subject / [optional: subtopic], for example: “california / earthquake / fema.” titles were of course written by trained abstractors before the introduction of asr transcripts. duran needed a new solution that was fast and consistent. using python scripts, the amazon web services (aws) command line interface (cli), and aws comprehend, duran extracted named entities from the body of the transcript for each segment. as documentation at aws describes this service, “amazon comprehend is a natural-language processing (nlp) service that uses machine learning to uncover valuable insights and connections in text” (amazon web services 2023). a named entity is a person, location, organization, or commercial product, and aws comprehend outputs all entities in a list outputted as formatted json. our script takes the list of entities, ranks them based on frequency and concatenates them into a string to match the style of the titles written by abstractors. the results were satisfactory but not ideal. first, a reporter’s name was almost always the first result, because they are typically mentioned several times in the story. reporters should be identified, but not in the title. duran turned to steve baskauf, data science and data curation specialist, for advice. given a list of known reporters for each network, baskauf developed a python function that utilized fuzzy matching to filter the reporters out of the title generation script. fuzzy matching increased the accuracy of the filter by recognizing common misspellings or spelling variations like john and jon. once a reporter’s name was identified, that data point was stored in a field for reporters. https://doi.org/10.7191/jeslib.805 e805/5 journal of escience librarianship 13 (1): e805 | https://doi.org/10.7191/jeslib.805 the second issue was that some stories were less about an entity and more about a concept or topic, like gun violence, climate change, or foreign policy. these subject terms are not entities; they often will not even be mentioned in a transcript explicitly but are implied in the newsworthiness of the entire news report. television news follows certain themes, specific aspects of society, that go beyond a news cycle and ner cannot recognize those concepts. we have not yet resolved this issue but are exploring options adding subject headings using auto-classification ai. nevertheless, the people, places, and other details of each news story are highly valuable, so ner is useful as a tool to recognize the entities. creating a data lake for computational analysis with the newly-created transcripts and supplied titles, the next goal was to make these data searchable and available for machine learning. in another context, anderson had already implemented a preliminary data lake for the university library. he recommended adding the transcripts and other show-related metadata to that nascent data lake2 with the goal of creating a repository of news-related data for use by vanderbilt university faculty and students. this data solution will be a new resource for data scientists interested in researching events as they are documented in multiple news sources, from periodical literature to broadcast news. the data lake will also make it possible to create and fine-tune machine learning models on news sources. vtna receives one to three requests for computational access per month on average, but previously lacked the infrastructure and transcripts for most ai/ml projects. we are excited about the research potential of the newly-created transcripts, especially when combined with the existing database of time-coded, titled, and abstracted news stories and commercials. by merging these datasets, users will have a truly unique and powerful source for studying news media across many decades. in addition to the textual datasets of transcripts and abstracts, we hope to build in capacity for the data lake to accommodate the study of visual and audio elements of the digital video. for example, a researcher could use machine learning to identify the usage of specific imagery like wildfires, airline accidents, or law enforcement activities, or even more abstruse studies like color schemes or sound effects. ultimately, the data lake will provide the environment for supervised and unsupervised machine learning projects at the vtna. several of the prospective projects described below assume the data lake as their computational environment. 2 a ‘data lake’ describes a big data environment for aggregating data sets of heterogeneous provenance and format. for more information, see anderson 2022. https://doi.org/10.7191/jeslib.805 journal of escience librarianship 13 (1): e805 | https://doi.org/10.7191/jeslib.805 e805/6 background the vanderbilt television news archive is like many other audiovisual archives. it consists of a collection of digitized tapes, a catalog that describes the video content, and researchers who request access to material. unlike many archives, our collection is highly homogenous, consisting entirely of national broadcasts of television news. this is split into two sub-collections: special reports and evening news. the evening news collection demands the most of our time and resources, because unlike the specials, which can be briefly summarized, an evening news report consists of segments of news reports and commercials. in 1973, the vtna office created a workflow in which abstractors watched an episode of news and completed three tasks for each news segment and commercial break: 1. identify the start and stop time of the segment; 2. summarize the news segment, focusing on people, places and the main topic of the report; and finally, 3. give each news story a title and for each commercial break, listing all the products and services advertised. for nearly fifty years, this metadata creation process continued unchanged; today, the vtna consists of 1.4 million records, dating from 1968 to present. the tv news archive operates today essentially as a database resource. users of the resource interact with a website search page to query the collection’s metadata to determine if the collection includes clips of television news relevant to their research topic. like most distinctive collections held by libraries and archives, the metadata is key to discovery and use. without a properly described collection, users won’t know what secrets are held within the pages, photos, or in our case, forgotten news stories and tv commercials. but the lack of helpful descriptive text and context is not a problem unique to library databases. the corporate world has similar problems describing its commercial products or summarizing and sorting the vast amounts of data streaming into privately held data warehouses. we all needed tools to help us make sense of the data. ethical considerations our project raised several ethical issues, including the possibility of downsizing our workforce, the misidentification of entities that we extract from transcripts, and the potential violation of privacy involved with facial recognition software. ai replacing skilled labor? there is a growing concern that artificial intelligence will take work away from people, but this was not a concern for us. when the vtna sought ai/ml tools to generate transcripts and enhance metadata, the goal was not to replace skilled labor. the vtna saw the same reduction in staffing that most libraries experienced during the past three to four decades. but that reduction in staff came at the same time as the material being collected by libraries grew in volume and density. we were tasked to do more with less—a problematic challenge. but the growth in information was also accompanied by a transformation from analog to digital, which opened the door for ai/ml tools to assist with the challenge. for the time being, adopting ai/ml tools does not threaten staffing levels because cultural heritage institutions are already short-staffed, and the https://doi.org/10.7191/jeslib.805 e805/7 journal of escience librarianship 13 (1): e805 | https://doi.org/10.7191/jeslib.805 field is trying to ride the wave of digital information threatening to flood our repositories. so while we do see automation tools as a way to replace labor, we regard ai as a way to create more equitable and sustainable workloads for our staff. in our case, we even explored the option of deploying a new cohort of workers to tackle the backlog. not only did we find this option cost-prohibitive, but we also determined the task of writing abstracts cannot be done by temporary, entry level, or outsourced staffing. we found that a fully trained and skilled abstractor could complete one hour of video content in six to eight hours. additionally, the worker needs to stay on task completing the episode, but they also need breaks to avoid burnout. in essence, abstract summaries require full-time skilled labor and we had too many episodes to complete and not enough funding. finding computer-based alternatives to human labor was the only option to meet our needs. that said, we do foresee that artificial intelligence and machine learning tools will affect the type of skills and experience we will seek in new staff members in the future. for example, we recently hired a cloud engineer to assist with automating our workflows and improving our discoverability systems. as ai/ml make it possible to automate repetitive tasks, we expect that moves to these automated systems will free staff members to work in other areas, particularly reference, outreach, and marketing. as ai tools for abstracting, indexing, and summarization improve, we will likely not rehire in these areas. of course, these tools are not perfect and will need human review, so it will be important to keep at least one expert metadata specialist on staff to review any machine-generated metadata. the correct spelling of names spelling first and last names correctly has been a priority of the vtna from the very beginning. a reason abstracts were so difficult to produce was because the abstractor would take the time to look up a person’s name if it wasn’t displayed on the screen. asr transcripts only use spoken words, and the transcript may not include the person’s name at all, because the news network left the person’s name out of the script and the speaker was only identified with a screen graphic. so, as we moved to asr transcripts, we had to accept an increase in inaccuracy and missing name percentages as a consequence. weighing the importance of quantity over quality may not be an ethical dilemma, but it certainly plays a key role in adopting ai/ml tools. we had to accept a small increase in spelling mistakes in order to move forward with this workflow. it is no different than the core principle of “more product, less process” (mplp) introduced by mark a. greene and dennis meissner, where archivists were encouraged to reconsider the expectations of arrangement and description of archival collections (greene and meissner 2005). in both situations, mplp and ai/ml tool adoption, the goal is to make archival material discoverable and accessible to researchers in an efficient and timely manner. nevertheless, we believe that ai/ml tools should not be adopted without taking a critical look at their shortcomings. we need to consider our reference instruction https://doi.org/10.7191/jeslib.805 journal of escience librarianship 13 (1): e805 | https://doi.org/10.7191/jeslib.805 e805/8 and search strategies, then communicate any changes regarding past data collection and description practices to our user community. additionally, we need to consider the impact on individuals with non-english names. the asr models we used were trained with the english language, using sample data from american media. the model will be the most accurate with the common names in the u.s. non-english names will have more spelling mistakes and should raise concerns of equity and bias. we will need to recognize this problem and prioritize its correction with future projects. screen reading algorithms, for example, can assist with this problem by identifying any names spelled on the screen. facial recognition finally, we have elected to pause some projects due to the inherent ethical privacy concerns. we have contemplated, for instance, developing a prosopography of major figures in the news. the idea was to trace the chronological appearances of public figures from different domains of culture (politics, business, society, sports, music, etc.) across networks. this project could be accomplished with off-the shelf tools such as aws rekognition. after consulting with experts at the internet archive and the gdelt project, we elected to suspend this project because of the potential for misidentifying individuals. we also worried about the incidental exposure of nonpublic figures who might appear in the background of news programs, which could lead to a loss of privacy. we would like to resume this project when we have better protocols in place to address these ethical hazards. who is affected by this project? staff: metadata creation the staff members responsible for metadata were the most impacted by the development of ai/ml tools and their workflows were changed the most. eliminating the backlog was the driving force for our search for new models. personnel at the archive were aware of the usefulness of transcripts as an alternative to a summary description. many researchers have requested transcripts, but the vtna did not have them. the closed caption stream in a television broadcast is a common source for textual data, but caption streams were not captured by the vtna. without access to the source textual data, we explored options for generating text transcripts and captions automatically using automated speech recognition (asr) software. this software continues to improve in accuracy and efficiency. by 2018, we were satisfied with the accuracy and pricing available by trint. this product not only used a language model that matched our subject matter, but the user interface was easy to navigate and easy to learn. with trint, our metadata department established a new workflow that focused on segmentation and spelling names and places correctly. the new process could be performed by temporary and student workers. a new employee would finish training in two hours, and, within a week, they were working at full speed. where abstracts took new employees eight hours to finish one episode after weeks of practice, the new workflow https://doi.org/10.7191/jeslib.805 https://aws.amazon.com/rekognition/ e805/9 journal of escience librarianship 13 (1): e805 | https://doi.org/10.7191/jeslib.805 required as little as two hours of labor for the same video runtime. with this new platform, vtna was able to eliminate the backlog in eighteen months. the new asr transcript program proved to be a successful replacement to our existing metadata creation process, although it still required additional work to integrate into our content management and database discovery system. the trint web application offered a variety of output formats for transcripts and captions, but none of the options integrated with our existing database of 1.4 million records. we needed a crosswalk from time-based transcript files to mysql database fields, including title and description. for the description we used the first 500 characters of the news segment’s transcript. this limited use of a transcript offered a level of summarization but kept the usage well below any copyright infringement. patrons: providing closed captions the vtna used the time-coded transcripts to embed closed captions in all access copies of the collection. video captioning is an essential element of accessible collections, allowing users to read the spoken words. the videos recorded by the television archive did not include captions originally. using the asr transcripts, we add the text to the video streams using a technology called ffmpeg, managed with python. the process took some time to complete—starting with the oldest files and moving to the present, we finished the project in three months. patrons: data use agreements as we build out the data lake, a key concern is developing terms of service for researchers. should the vtna impose any terms for research beyond those stipulated in our licensing agreements? if so, what additional terms would be reasonable to impose? we have consulted with our office of sponsored research about the potential of using so-called “data use agreements” when providing access to the data lake, but it is not clear that such agreements apply when data is being used internally and not shared with external partners. lessons learned and future work future work: abstracting and indexing a near-term project is the summarization of news segments. as noted above, the vtna relies on fulltime staff to write abstracts for segments of television news programming. given the time and expertise required, we decided to reduce the number of abstracts we create each week, limiting them to a subset of our collection. as we explained, we substituted full text transcripts for abstracts to foster the searchability of our collection. still, abstracts provide a salient overview of segments and, crucially, do so in language that is not under copyright since our abstracts represent interpretations of the news. so, while we cannot display full text transcripts to the public (or, at least, cannot display more than limited snippets of those transcripts), we can display the abstract publicly. currently, the abstract provides a key resource for external users searching our collection in advance of transacting a loan. https://doi.org/10.7191/jeslib.805 journal of escience librarianship 13 (1): e805 | https://doi.org/10.7191/jeslib.805 e805/10 commercial and open-source machine learning models for summarizing text have existed for some years. but we must differentiate between tools that perform extractive and abstractive summarization. extractive summarization works by identifying the most significant sentences in any given text, essentially “boiling down” that text to key passages. by contrast, abstractive summarization rewrites the text, drawing on new language to represent the meaning of the passage. for our purposes, extractive summarization does not satisfy our abstracting requirements since the technique cannot learn our implicit patterns for writing abstracts and does not free us from copyright constraints since it reuses the literal text. abstractive summarization has the potential for satisfying both goals but was not effective until very recently. the advent of large language models (llms) from openai, google, and others makes conceivable the application of abstractive3 summarization to our corpus. these tools are trained on enormous quantities of text, making it possible to apply them to corpora without fine tuning. by reverse engineering the rules for writing abstracts, we should be able to write complex prompts for abstracting the transcripts of news segments according to our traditional standards. however, some of these models, including gpt-3.5, allow for fine-tuning. in this scenario, we would need to supply manually produced abstractions and transcripts for several hundred segments, essentially “teaching” the model how to “predict” or, in plainer terms, to write summaries of future segments. now that the openai api has significantly expanded its character limits (to 8k and 32k, respectively), such fine-tuning is within reach, though it would come at significant expense ($0.03 to $0.06 per 1k tokens). progress is also being made in techniques for diarization and chapterization. the goal of diarization is to detect changes in speakers. working from audio files, diarization technologies detect and label speakers. by developing the prosopography of journalists and newsmakers, we could add valuable context to our transcripts, allowing researchers to study the interaction between journalists and their subjects. (but see our note about the related ethical concerns above.) chapterization divides heterogeneous a/v into semantic segments. in the case of television news, for example, chapterization would divide the shows into news segments. tools such as automated scene detection (asd), a machine-learning technique used in video production, can already divide video into units. as the name implies, the technique could be used to detect transitions in news shows, a possibility that media researchers have been investigating for more than two decades (zhu et al. 2001). however, news segments frequently have many scene changes, making asd too fine-grained an approach. research into chapterization of the news continues, drawing on multimodal clues from video and text (screen ocr), for example, to distinguish segments in new shows (rozsa and mocofan 2022). we expect that pragmatic methods for both diarization and chapterization will be available at scale within the next decade, allowing us 3 abstractive summarization differs from extractive summarization by focusing on preserving the semantic context rather than identifying key phrases when condensing texts. https://doi.org/10.7191/jeslib.805 e805/11 journal of escience librarianship 13 (1): e805 | https://doi.org/10.7191/jeslib.805 to automate key parts of our video capture and preservation process while still maintaining our distinctive approach to identifying speakers and describing news segments. future work: analytical tools the vtna not only preserves the news, it also provides tools for researchers to analyze the news. at present, our main offering is a search engine, which permits researchers to look up shows by date and network, and to search for keywords in titles, abstracts, and transcripts. our discovery system works reasonably well, allowing users to find related news stories by using key terms. however, we believe that we can provide superior tools for analysis. there are two near-term projects that we expect to enable qualitatively superior analysis. the first is the deployment of a graph database to bring the latent social graph in television news to the surface. a graph would show linkages between news segments and news shows, allowing users to traverse segments based on their nearness or distance to events rather than on keywords. when combined with natural language processing techniques such as topic modeling and sentiment analysis, graphs would show patterns of coverage over time. so, for instance, a user could trace how different news networks cover evolving stories, exemplifying the amount of time that networks devote to topics, the language they use to cover them, and the tone they employ during their coverage. pending budgetary approval, the vtna is planning to deploy a graph database called neo4j to enable such network analysis. the second is the provisioning of a vector database for vector-based or semantic searching of the vtna’s collection. a vector database requires that you create word embeddings from your search documents. in simple terms, this requires converting each word into a number and then storing those numbers in a tensor or multidimensional array. unlike traditional ‘bag of words’ approaches, words in the embedding maintain their relationships to other words in the tensor, allowing the use of techniques like cosine similarity to measure the similarity or distance between vectors of words, i.e., sentences. the techniques for creating word embeddings have grown in sophistication since the invention of the word2vec algorithm in 2013, but storing and retrieving information from these embeddings for production use has proved challenging. recently, a new class of vector databases has emerged to meet this need, offering databases with search engine-like capabilities that provide results based on semantic similarity rather than keyword. so, for example, a patron could search for the phrase “space shuttle” and receive hits from documents that mention only “challenger,” as well as provide related segments that satisfy some level of similarity. tentatively, we plan to use openai’s embeddings to create word embeddings from our abstracts and transcripts and the pinecone vector database to productionize our semantic search engine. the use of word embeddings may address the problem noted above with variant spellings of personal names by allowing us to identify clusters of closely related names in a fashion akin to topic modeling and, if we desire, to create a thesaurus of names with known variants. but word embeddings also introduce a different problem, namely, the surfacing of implicit bias in the news. given the historical nature of our collection, https://doi.org/10.7191/jeslib.805 https://platform.openai.com/docs/guides/embeddings https://www.pinecone.io/ journal of escience librarianship 13 (1): e805 | https://doi.org/10.7191/jeslib.805 e805/12 we may find problematic associations surfacing in the relationship between words, potentially reinforcing stereotypes of women, minorities, and people of other nationalities. a different kind of analysis is made possible through the generation of ngrams from our corpus. an ngram is an ordered sequence of word tokens. as the name implies, ngrams can be of any length, but are usually two to three tokens in sequence. by using ngrams, it becomes easier to differentiate between topics in a corpus, for example “washington, dc” and “george washington.” in the field of audiovisual media, dr. kalev leetaru of the gdelt project has released a ngrams dataset for television news, which can be used to analyze the topics discussed on different networks (leetaru 2021). the vtna is considering generating ngrams of its data to complement the dataset, but we need to work with legal counsel to assure that these ngrams are not considered derivative works of the copyrighted broadcasts. the gdelt project has also pioneered the concept of so-called visual ngrams, which sample audiovisual broadcasts at defined intervals, taking snapshots of the news show (leetaru 2022). these visual ngrams fall under the provisions of “fair use,” and again allow comparison between the networks’ coverage of topics. conclusion the vanderbilt television news archive has sustained its operation for more than 50 years, though its existence was at times challenged by the cost of capturing, describing, and preserving the news.4 we also continually have sought ways to fulfill our public mission of providing access to the cultural record of television news, pushing the technological frontiers in order to expand access to our collection. by drawing on new ai/ml techniques, the vtna has found ways to make itself both more sustainable by lowering the cost of indexing and abstracting while also expanding its audience, by providing new ways to search the collection. as we move forward in the ai/ml era, we hope to build on these early successes while keeping a careful eye on the potential ethically related pitfalls of misdescription, de-professionalization, and compromised privacy. 4 on the sustainability challenges at the vtna, see marcum 2013. https://doi.org/10.7191/jeslib.805 e805/13 journal of escience librarianship 13 (1): e805 | https://doi.org/10.7191/jeslib.805 documentation figure 1: complete workflow, 2024. this workflow depicts the entire process in which video streams are acquired, processed, described, and accessed by the various user groups of the tv news archive. acknowledgements vanderbilt university is a leader in the development of artificial intelligence and machine learning. the computer science department has several faculty who specialize in these fields. the college of arts and science at vanderbilt has also commissioned an interdisciplinary group of faculty from the humanities and social sciences to address the grand challenge of artificial intelligence. one of the writers of this case study is participating in that grand challenge on behalf of the university library. the research case study was developed as part of an imls-funded responsible ai project, through grant number lg-252307-ols-22. competing interests the authors declare that they have no competing interests. https://doi.org/10.7191/jeslib.805 https://www.lib.montana.edu/responsible-ai/ https://www.imls.gov/grants/awarded/lg-252307-ols-22 journal of escience librarianship 13 (1): e805 | https://doi.org/10.7191/jeslib.805 e805/14 references amazon web services. “amazon comprehend.” accessed march 21, 2023. https://aws.amazon.com/comprehend. anderson, clifford b. 2022. “an introduction to data lakes for academic librarians.” information services & use 42 (3–4): 397–407. https://doi.org/10.3233/isu-220176. greene, mark, and dennis meissner. 2005. “more product, less process: revamping traditional archival processing.” the american archivist 68 (2): 208–63. https://doi.org/10.17723/aarc.68.2.c741823776k65863. leetaru, kalev. 2021. “announcing the new web news ngrams 3.0 dataset.” the gdelt project blog. december 15, 2021. https://blog.gdeltproject.org/announcing-the-new-web-news-ngrams-3-0-dataset. leetaru, kalev. 2022. “visual explorer: 5.25 million broadcasts totaling 12.3 billion seconds of airtime = 3 billion analyzable images spanning 50 countries and 1 quadrillion pixels.” the gdelt project blog. november 8, 2022. https://blog.gdeltproject.org/visual-explorer-5-25-million-broadcasts-totaling-12-3-billion-secondsof-airtime-3-billion-analyzable-images-spanning-50-countries-and-1-quadrillion-pixels. marcum, deanna. 2013. “vanderbilt television news archive.” ithaka s+r case study. https://doi.org/10.18665/sr.22672. rozsa, benjamin, and muguras mocofan. 2022. “tv news database indexing system with video structure analysis, representative images extractions and ocr for news titles.” in 2022 international symposium on electronics and telecommunications (isetc) 1–4. https://doi.org/10.1109/isetc56213.2022.10010319. zhu, xingquan, lide wu, xiangyang xue, xiaoye lu, and jianping fan. 2001. “automatic scene detection in news program by integrating visual feature and rules.” in advances in multimedia information processing — pcm 2001, edited by heung-yeung shum, mark liao, and shih-fu chang, 843–48. lecture notes in computer science. berlin, heidelberg: springer. https://doi.org/10.1007/3-540-45453-5_109. https://doi.org/10.7191/jeslib.805 https://aws.amazon.com/comprehend/ https://doi.org/10.3233/isu-220176 https://doi.org/10.17723/aarc.68.2.c741823776k65863 https://blog.gdeltproject.org/announcing-the-new-web-news-ngrams-3-0-dataset https://blog.gdeltproject.org/visual-explorer-5-25-million-broadcasts-totaling-12-3-billion-seconds-of-airtime-3-billion-analyzable-images-spanning-50-countries-and-1-quadrillion-pixels https://blog.gdeltproject.org/visual-explorer-5-25-million-broadcasts-totaling-12-3-billion-seconds-of-airtime-3-billion-analyzable-images-spanning-50-countries-and-1-quadrillion-pixels https://doi.org/10.18665/sr.22672 https://doi.org/10.1109/isetc56213.2022.10010319 https://doi.org/10.1007/3-540-45453-5_109 ethical considerations in utilizing artificial intelligence for analyzing the nhgri’s history of genomics and human genome project archives journal of escience librarianship 13 (1): e811 doi: https://doi.org/10.7191/jeslib.811 issn 2161-3974 full-length paper ethical considerations in utilizing artificial intelligence for analyzing the nhgri’s history of genomics and human genome project archives mohammad hosseini, northwestern university feinberg school of medicine, chicago, il, usa, mohammad.hosseini@northwestern.edu spencer hong, national institutes of health, bethesda, md, usa kristi holmes, northwestern university feinberg school of medicine, chicago, il, usa kris wetterstrand, national institutes of health, bethesda, md, usa christopher donohue, national institutes of health, bethesda, md, usa luis a. nunes amaral, northwestern university, evanston, il, usa thomas stoeger, northwestern university feinberg school of medicine, chicago, il, usa abstract understanding “how to optimize the production of scientific knowledge” is paramount to those who support scientific research—funders as well as research institutions—to the communities served, and to researchers. structured archives can help all involved to learn what decisions and processes help or hinder the production of new knowledge. using artificial intelligence (ai) and large language models (llms), we recently created the first structured digital representation of the historic archives of the national human genome research institute (nhgri), part of the national institutes of health. received: november 15, 2023 accepted: february 5, 2024 published: march 5, 2024 keywords: artificial intelligence, ai, large language models, national human genome research institute (u.s.), human genome project, ethics, responsibility, privacy citation: hosseini, mohammad, spencer hong, kristi holmes, kris wetterstrand, christopher donohue, luis a. nunes amaral, and thomas stoeger. 2024. “ethical considerations in utilizing artificial intelligence for analyzing the nhgri’s history of genomics and human genome project archives.” journal of escience librarianship 13 (1): e811. https://doi.org/10.7191/jeslib.811. data availability: the content and metadata extracted from the archival materials of nhgri are not available to be broadly shared due to the same ethical considerations discussed in this case study. the journal of escience librarianship is a peer-reviewed open access journal. © 2024 the author(s). this is an open-access article distributed under the terms of the creative commons attribution 4.0 international license (cc-by 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. see https://creativecommons.org/licenses/by/4.0. open access https://doi.org/10.7191/jeslib.811 mailto:mohammad.hosseini%40northwestern.edu%0d?subject= https://doi.org/10.7191/jeslib.811 https://creativecommons.org/licenses/by/4.0/ https://orcid.org/0000-0002-2385-985x https://orcid.org/0000-0002-2278-2758 https://orcid.org/0000-0001-8420-5254 https://orcid.org/0000-0001-6866-2842 https://orcid.org/0000-0003-4286-5427 https://orcid.org/0000-0002-3762-789x https://orcid.org/0000-0002-5540-4278 journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 e811/2 abstract continued this work yielded a digital knowledge base of entities, topics, and documents that can be used to probe the inner workings of the human genome project, a massive international public-private effort to sequence the human genome, and several of its offshoots like the cancer genome atlas (tcga) and the encyclopedia of dna elements (encode). the resulting knowledge base will be instrumental in understanding not only how the human genome project and genomics research developed collaboratively, but also how scientific goals come to be formulated and evolve. given the diverse and rich data used in this project, we evaluated the ethical implications of employing ai and llms to process and analyze this valuable archive. as the first computational investigation of the internal archives of a massive collaborative project with multiple funders and institutions, this study will inform future efforts to conduct similar investigations while also considering and minimizing ethical challenges. our methodology and risk-mitigating measures could also inform future initiatives in developing standards for project planning, policymaking, enhancing transparency, and ensuring ethical utilization of artificial intelligence technologies and large language models in archive exploration. summary the national center for human genome research, the precursor to the national human genome research institute (nhgri), was created at the national institutes of health (nih) in 1989 to guide the us development of the human genome project (hgp), a watershed moment in biomedical research. recognizing the historic value of the hgp, nhgri preserved and archived a significantly large number of internal documents of the hgp and subsequent genomics initiatives. presently, this archive, which is the only historic genomics and human genome project archive within nih, houses an estimated two million pages that include cost-benefit analyses, interim reports, grantee presentations, internal memos, strategy papers, internal working documents, server logs, presentation, emails and scanned letters among senior personnel and toward external key stakeholders, and more—essentially anything that has been produced in relation to the institute’s core mission guiding and funding genomics. a collaboration between northwestern university’s amaral lab, stoeger lab, galter health sciences library, and the history of genomics program at the nhgri now aims to computationally analyze and understand the decision-making processes behind a massive international public-private biomedical project (see figure 1). the origins and importance of the data motivated us to be especially cautious of ethical considerations regarding opening and studying this rich archive with tools powered by artificial intelligence (ai). our observations could help members of the glam (galleries, libraries, archives, and museums) community navigate the numerous ethical challenges of using ai and large language models (llms) when processing and exploring archives. future efforts could gain insights from our approach to establish best practices https://doi.org/10.7191/jeslib.811 e811/3 journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 for project planning, policy-making, fostering transparency, and promoting responsible use of ai when exploring archives. project development human genome project (hgp) the initial sequencing and analysis of a human genome has been described as a transformative moment in the history of biological research (hood and rowen 2013). knowing the sequence of nucleic acids within human dna promised to accelerate the identification of human genes. knowledge about the human genome’s sequence, enabled genomics researchers to monitor within a single experiment a near-complete set of genes, rather than studying them individually. this helped to better characterize the genetic architecture of health and disease, as well as common and rare genetic conditions, yielded insights into the origins and history of our species, and enabled novel types of studies. since then a significant number of polymorphisms within genes has been discovered, and their biological and medical significance evaluated. the initial sequencing of the human genome was achieved through two competing efforts. an international consortium, called the international human genome sequencing consortium (ihgsc), and a commercial effort, spearheaded by celera genomics, with the latter initially striving to patent the genes discovered during their sequencing efforts. on june 26th, 2000, members of both initiatives, together with national and international politicians publicly announced the initial completion of their efforts in a press conference at the united states white house, where the former president, bill clinton, noted “we are here to celebrate the completion of the first survey of the entire human genome. without a doubt, this is the most important, most wondrous map ever produced by humankind” (“june 2000 white house event” 2012). simultaneously, clinton also pointed out figure 1: collaboration timeline between the history of genomics program of national human genome research institute (nhgri), northwestern university’s amaral lab and galter health sciences library. https://doi.org/10.7191/jeslib.811 journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 e811/4 the inherent ethical considerations that surround research into the human genome: “... increasing knowledge of the human genome must never change the basic belief on which our ethics, our government, our society are founded.” project’s scope and foundational phase our project has been exploring the history of the hgp directed by the nhgri between c. 1993 and c. 2008. during this period scientists achieved transformative advances in mapping and sequencing technology, as well as significant developments in organismal and comparative sequencing, disease gene mapping, and insights into the genetic architecture of health and disease. this period also included the completion of various iterations of the genome sequence and the development of sundry genomics programs. examples included the publication of the initial draft sequencing results in 2001, the formal conclusion of the project in 2003, and the development of landmark genomics programs such as the international hapmap project, which developed a genome wide comparative map of human variation (“international hapmap project” 2012). in 2012, eric green, who had been appointed nhgri director in 2009, proposed a unique history program at the nhgri to not only preserve and analyze these materials, but to develop an international effort to promote scholarship into the history of the hgp using these materials. led by dr. chistopher donohue and kris wetterstrand, ms, acting as the liaison to the division of extramural research, the history of genomics program continues to preserve the history of the hgp and promote research into its rich history. a major step for preservation of the records was the digitization of the day-by-day programmatic aspects of hgp guidance and funding, which later moved to the preservation of shared documents and communications on follow-up efforts to the hgp after 2003. the vast majority of the hard-copy materials were digitized via bulk scanning methods, using 300 dpi, panasonic high-speed scanner for bulk scanning. some materials, particularly those related to the development of sequencing technology, were digitized using “hand scanning” at a much higher resolution. most, if not all, of the computationally investigated materials, are the product of bulk scanning or are “born digital” resources such as those created using microsoft word. though these preservation efforts were primarily directed toward the nih and nhgri, the archive also has limited representation from other organizations involved in the project (e.g. the department of energy, celera genomics). the motivation to implement the structured digitization of the materials in collaboration with scientists outside of the nih grew from the specific shared interests and competencies of those involved. the current project had its birth in february 2019, when dr. christopher donohue invited dr. thomas stoeger to present a guest lecture at the nhgri’s history of genomics and molecular biology lecture series to speak about his quantitative research into the natural and social factors that underlie research into individual human genes. follow up discussions created an opportunity for collaboration and to extend computational approaches for studying scientific literature toward archival documents surrounding the hgp. https://doi.org/10.7191/jeslib.811 e811/5 journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 technical and regulatory requirements over the subsequent months, drs. donohue and stoeger used a set of ~1,000 documents to develop an initial project proposal and outline, and piloted feasibility studies. the goal of pilot studies was to show prospects of this endeavor and gain support for a larger effort within the nhgri. a key insight was that the project needed relatively little computing power, and that a local workstation could be used to avoid the risk of potential security breaches that could occur in a decentralized infrastructure. furthermore, it was hypothesized that tools that had been developed in other scientific contexts such as gensim (for automated keyword extraction) and doc2vec (to organize documents by similarity) could be reused for some of the necessary tasks. importantly, it did not escape their attention that the data contained instances of potentially sensitive information (such as potential interventions in clinical genetics and broad discussions of health-related questions) that could possibly be retrieved through very specific queries. one solution was to limit the possibility of malicious queries by conceptually separating data preprocessing and data analysis. accordingly, different redaction procedures such as masking sensitive documents, people names, and timestamps were implemented. furthermore, guidelines that allowed the nhgri to forbid certain usage scenarios were refined in a later stage of the project. this initial technical proof and the resulting design of the computationaland data-workflow were essential to create demos and gain support for the project within the nhgri. later, an application was submitted to the institutional review board (irb) at northwestern university for the development of a content extraction platform. the project was deemed low-risk and thus exempt. scale up phase in order for the project to progress, it became necessary to have someone dedicate their full attention to it, something that neither dr. donohue nor dr. stoeger could afford. fortunately, an incoming graduate student in the amaral lab and chemical and biological engineering graduate program at northwestern university—spencer hong—was interested in spearheading the research. the first step in the scaling up of the project was to go from ~1,000 initial documents to ~20,000 (corresponding to 3,894 files), which had been frequently accessed or requested by historians of science. this “core” sample served as a valuable and well-reviewed corpus that is representative of document types and scientific projects in the entire archive, which we estimate to be around 2 million pages. the scaling up created several challenges. on the technical level, we had to create novel algorithms and approaches for content extraction that were outside of the predicted scope of the project. for instance, models for scan segmentation (a task where multi-document scans are separated into their logical boundaries) did https://doi.org/10.7191/jeslib.811 journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 e811/6 not exist at the start of the study, and only recently did llms become widely available and enabled us to create sufficient high quality synthetic data for model training. training deep learning models required large computer resources that normally would be provided by an institutional cluster (e.g. quest at northwestern) or cloud services (e.g. aws). however, the data use agreement between northwestern and nhgri mandated that the data stay isolated in a local, encrypted drive. therefore, the northwestern team received an in-equipment grant from nvidia that supplied two large gpus, each with enough memory to train large language models. a dedicated workstation was built with encrypted drives and a fail-safe backup; with no other projects or users added to the workstation, we ensure privacy and protection for the nhgri archive data. this workstation houses all of the analyses and models created in the project. instead of relying on cloud services, we used software alternatives and models that could be set up locally to avoid compromising data privacy. when drawing from improvements in external algorithms and computational tools, one discrepancy remains. specifically, in preservatory contexts, metadata of historical archives of federal agencies may need to follow additional metadata standards defined by other parties such as the u.s. national archives and records administration (nara) metadata requirements (transaccess 2019). this issue was pointed out by zachary utz, ma, archivist and public historian, based at the nhgri history of genomics program at the nih. while it is possible to implement such standards, we have not yet done so because we were concerned that an early focus on these requirements may discourage a greater exploration of computational approaches to extract metadata. we anticipate that later stages of the project will involve following specific nara requirements and maturity of the project will be determined by adherence to such standards. the galter health sciences library at northwestern university’s feinberg school of medicine has played a unique role in this project by providing guidance on how to manage the data in light of the new nih data management and sharing policies (gonzales et al. 2022; hughes et al. 2023), and in anticipating and avoiding ethical issues that are common in social science projects that involve using large digitized datasets (hosseini et al. 2022). one conceptual challenge was how to best include ethics as an integral part of the project rather than an afterthought. at this point dr. hosseini, an ethicist at northwestern’s department of preventive medicine and based at galter library, offered assistance to develop a plan to conduct the study in accordance with ethical norms and best practices. background the primary reason for digitization was to improve the accessibility of the archive, and to facilitate subsequent studies. potential applications could range from identifying documents of interest to historians, to allowing large contextual analyses that would be painstakingly manual and labor intensive for close reading. future explorations could investigate specific aspects of the hgp (e.g. how different centers were led, the degree https://doi.org/10.7191/jeslib.811 e811/7 journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 to which governance and innovation of the project changed over time), or use the hgp as a model for understanding contemporary history of science and institutional decision making. some of the technical challenges we faced in this project arose from the lack of preceding studies with similar characteristics, which required developing novel tools and building connections between existing tools. moreover, while the sample corpus is large in terms of the manual effort required for its reading, it is nonetheless too small for the training of cutting-edge machine learning approaches. we solved this challenge by adopting the use of synthetic data, which additionally freed us from concerns about privacy, scale, or data diversity. for this purpose and depending on the task, we used synthetically arranged document boundaries, superimpositions of handwriting, and other features observed in the corpus. at a non-technical level, a lack of precedents also required developing guidelines and procedures that will respect the interests of involved parties and the privacy of individuals mentioned in the archive. though individuals’ identity is usually known to historians of science, computational data mining could, for instance, reveal potentially private or stigmatizing information and it was our ambition to prevent these instances. we started with existing tools such as gensim (a python library for analyses related to documents) and spacy (open-source software library for advanced natural language processing) for content extraction. these tools generally worked well but required data preprocessing to remove handwriting. additionally, we developed customized machine learning annotation tools to review models and refine them. for guidance and inspiration, we also considered the archivesspace platform to create an interlinked database of metadata and documents, and de-identification tools of amazon web service to mask sensitive information and prevent transfer to another storage location. ethical considerations breach of confidentiality and harming subjects and/or their reputation the availability of large-scale, machine-readable data enables the successful implementation of queries that could retrieve private or sensitive information and harm individuals captured in such data. experts have in the past explored and discussed this issue using specially trained, fine-tuned models (ahmed et al. 2021; meystre et al. 2010). however, complete de-identification remains a challenging issue, one that has become more fraught with ai advancement and increased data access. user data, even when de-identified, have been shown to be re-identifiable (barbaro and zeller jr. 2006; narayanan and shmatikov 2008) and even aggregate statistics about a dataset can undermine de-identification efforts (dick et al. 2023). we, as the scholars wishing to computationally study documents that capture numerous individuals, wished to avoid a scenario where we inaccurately state that our data was fully de-identified, when it realistically was not. our data origins (from early 1990s to the late 2000’s) and content (informal and formal correspondences, handwritten notes, receipts, and other paperwork that would exist in any large organization) introduce colloquial, conversational, and informal language that individuals may or may not choose to be associated https://doi.org/10.7191/jeslib.811 journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 e811/8 with. documents like handwritten notes, email correspondence about non-scientific topics, and recorded personal communications may contain information that could bring possible personal and social harms to subjects captured in the archive. the particular risks associated with handwriting we can reasonably anticipate that titles, handwriting, and timestamps may all risk reidentification. handwriting specifically, may facilitate reidentification through text as well as handwriting style. indeed, handwriting has been proven to be unique to individuals (faundez-zanuy et al. 2020), and the scale of the archive would be enough for a machine to trace back the handwriting style to an individual. furthermore, handwritten text still often escapes traditional engines, which means that the protected health information (phi) included in handwritten text do not get flagged by existing phi detection models. the nature of handwritten text indicates a colloquial, or informal, usage of language leading to writing more specific to the individual than professional use of language in the workplace. lastly, because of the abundance of email correspondences in the archive, the timestamps could generate approximate chains of existing text, which may create longer threads of text than existing standalone text that can help reidentify individuals. for these reasons, the possibility of complete de-identification of the archive may be infeasible and undesirable as it could result in uncontrolled reuse. for instance, declaring a dataset de-identified might make it suitable for the use of ai and computational methods that could ultimately result in reidentification. such instances would indicate unawareness of misuse and accountability (shahriari and shahriari 2017). therefore, our team decided to move to encoding key phi and implemented explicitly stated analyses through northwestern irb, detailed below. privacy and consent this archive is a collection of internal documents from the nascent beginnings of the hgp to the modern genomic initiatives led by nhgri. unlike other scientific or bibliographic databases, the nhgri archive houses documents created as a byproduct of knowledge production, including correspondences, emails, memos and notes from the leadership, administration, and employees of nih and others involved in the hgp.1 therefore, the archive’s contents capture individuals in varying degrees of reputation and prominence, from francis collins, former nhgri director (1993–2008) to staff and representatives from external organizations. 1 our approach is inspired by principles and standards offered by the international council on archives (ica) and the open archival information system (oais) reference model on how to regulate the process of preserving and maintaining access to stored digital information over the long term. https://doi.org/10.7191/jeslib.811 e811/9 journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 individuals who worked on the hgp and subsequent initiatives have neither been approached nor agreed to have their email conversations or handwritten notes analyzed using ai and machine learning tools.2 even if all data is fully de-identified and there is no risk of harm, we are analyzing user data without explicit consent. this could be viewed as undermining subjects’ autonomy and a harm in and of itself. furthermore, the dataset includes both nih and non-nih employees. the nih, as a federal agency under the department of health and human services, has explicit legal guidelines about the data that is kept during employment. as a government entity, the nih has the responsibility to archive and preserve any and all documents deemed crucial. as part of being employed by the nih, all nih-affiliated individuals have consented to these recordkeeping procedures. however, the same cannot be said for involved individuals who were not employed by the nih. for example, individuals in other academic institutions, private companies, and in other nation-states may have different recordkeeping policies, cultural norms, and social expectations about what is reasonable to be kept long term. by studying the documents created by these individuals without their consent, we may be infringing on their privacy and autonomy (insofar as they have no control over what happens to their information). we do not have consent from all of those whose data is in the archive, nor can we acquire consent for the following reasons: • the time period covered by the archive spans the early 1990s to the late 2000s; many individuals mentioned in the database have since retired, are deceased, or are otherwise unreachable. • individuals captured in the nhgri archive include both the nih employees and scientists from external organizations and foreign countries. neither the nhgri nor the nih have access to the most recent contact information of all of these individuals. • the nhgri archive consists of both public (e.g. published papers, congressional bills) and private (e.g. email conversations, informal drafts) documents. therefore, it is not always possible to ascertain which individuals mentioned in a document would have to provide consent, and which individuals would not require consent (e.g. because data is already in the public domain). 2 individuals who participated in the hgp, from the scientists who supported them and the institutions that funded them, are spread across many different institutions and nations. their work and the corresponding material that came to be archived are therefore subject to different, and—at times—conflicting guidelines for future archival, analysis, and storage. without consistent and explicit consent from all of the individuals mentioned in the archive, one could view this as undermining the subjects’ autonomy. we believe this to be an issue common in many other archival situations (such as cold spring harbor) where the mentioned individual’s affiliation spans outside of the institution that is doing the archiving. https://doi.org/10.7191/jeslib.811 journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 e811/10 after engaging with the data owners, machine learning experts, and data librarians, we decided on two acceptable scenarios: • conduct de-identification and declare the nhgri archive de-identified. • detect and encode protected private information with a key and engage with northwestern irb to render it encoded data. our team opted for the latter solution. as described earlier, many remnants of data outside traditional phi schemes can reidentify individuals in a de-identified dataset. therefore, claiming a dataset to be fully de-identified would have downstream consequences: if considered fully de-identified, a dataset could be deemed appropriate for publication and sharing. furthermore, the nature of the nhgri archive makes it more difficult than other datasets to fully de-identify: the multi-modal and heterogeneous (text and image) data, handwriting, unstructured layouts, and informal language all make the full de-identification of the nhgri archive a daunting task, if at all possible. by choosing to engage with irb to render this data “encoded,” we acknowledged some of the aforementioned challenges, and proposed addressing them using institutional safeguards. we employed several measures to minimize the risk of confidentiality breach. these included encoding possible names, physical addresses, social security numbers, credit card numbers, and email addresses from the nhgri archive. then, we developed in-house models to isolate and remove handwriting from all digitized documents in the archive. because there are select documented instances of handwritten names, credit card numbers, and email addresses, handwriting is a source of phi. however, as noted, traditional ocr engines like tesseract cannot recognize handwritten characters, escaping de-identification efforts. by isolating handwritten sections, we can both remove handwritten pieces as well as send these to a separate pipeline dedicated to handwriting (li et al. 2022). although we have confirmed the technical possibility of separate processing, we currently abstain from it due to the challenging time requirements for reviewing the extracted metadata to ensure that no sensitive information will be released. currently it remains unclear how much information is lost by removing handwritten notes. we estimate that around a third of the documents contain some form of handwriting (e.g. fully handwritten or containing other markings), which could add information beyond the printed content of these documents. furthermore, we fine-tuned existing named entity recognition models to include more entity categories. in many de-identification methods, either existing entity recognition datasets (balasuriya et al. 2009; sang and de meulder 2003) are used or a major portion of the dataset-to-be-de-identified are labeled for fine-tuning. however, using these methods has several major challenges: the text in existing datasets are out-of-domain and do not appropriately fit the language in the dataset of interest; labeling parts of the dataset for finetuning introduces risks of exposing the very individuals we wish to de-identify during the annotation cycle; and, labeling datasets for fine-tuning is unsustainable and highly manual. https://doi.org/10.7191/jeslib.811 e811/11 journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 therefore, we explored the use of synthetic entities, generated by rules and llms, to help fine-tune a pretrained model. recapitulating concerns echoed in the community about collapse of pre-trained models (shumailov et al. 2023) and algorithm bias (yu et al. 2023), we found that fine-tuning of entity detection models on in-domain nhgri text was essential to reliably detect entities within the archive. as a final step to ensure the security and privacy of the data, we placed it in an isolated and encrypted workstation to minimize exposure to malicious agents. typically, use of ai approaches requires the use of powerful cloud based computing platforms. however, we avoided risks posed by the use of cloud resources by ensuring that this isolated workstation has enough computing power (gpus, ram, storage) to train state-of-the-art models without the need for cloud services. ensuring maximum data security the nhgri history of the human genome project and genomics archive resides outside of nhgri’s division of intramural research and the extramural research program, which are among the two main branches through which the nih supports research. this necessitates setting up hierarchical access levels and precautionary measures (e.g. data transfer agreement, non-disclosure agreement) to ensure maximum data security. for example, we collaboratively decided to limit access of researchers at northwestern to around 20,000 documents that have been reviewed by nhgri. in addition some of these precautionary measures might prevent full adherence to fair (findable, accessible, interoperable, reproducible) principles and require redefining some of these principles to fit the contingencies of our project. for example, metadata about the content of documents is being created. as this metadata could contain sensitive information it will only be accessible to other scholars after signing of a data usage agreement. while strictly speaking, this decision is not in-line with fair principles, it is a reasonable tradeoff because it helps mitigate risks, ensure ethical compliance, and maintain data privacy standards while enabling valuable research outcomes. responsibility and accountability similar to other collaborations between humans and ai, responsibilities and accountabilities are subject to diffusion. since ai is trained by humans, human collaborators are essentially also responsible for ai mistakes during the de-identification process. this is an ongoing issue that cannot be resolved. if methods used in this project are reused by other scholars without the adopted due diligence and risk mitigating measures in this project, or worse, by malicious actors, we will enter the gray zone of accountabilities. while members of this project cannot be held liable for others’ mistakes, negligent scholars or malicious actors can claim that they used available tools and data and should not be held to account either, thereby creating an accountability impasse. to mitigate these risks, when posting the code and methods, we will develop and publish a brief “expectation of use” document that explains how the developed methods and tools should be used and what consideration should be observed in future studies to prevent harm. https://doi.org/10.7191/jeslib.811 journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 e811/12 who is impacted by this project? this ai-powered content extraction and subsequent computational analysis impact the individuals discussed, mentioned, and captured in the nhgri archive. these individuals vary in their roles, ranging from the top leadership of various departments at the nih to intramural and extramural scientists and staff. one expected outcome of this project is to gain a data-driven understanding of the practical operation of the scientific mechanisms inside funding institutions. the nhgri archive contains materials regarding conception, clearance, and approval of large projects inside a focused institute. using machine learning models, we can explore factors that drove decisions and investigate how nascent beginnings of large projects emerged. in so doing, we would be able to identify inefficiencies, biases, or other factors that impacted the project timeline and their final outcomes. this knowledge would be beneficial to future administrators and the broader program evaluation services inside funding institutions. this work is the first computationally driven analysis of a large internal archive of a major scholarly funding agency. we anticipate that our implementation will serve as an example for other archivists and library administrators in funding agencies that wish to study their own materials. therefore, the procedures and policies set in place in this study will affect not only users of this data, but potentially owners and users of other archives who follow our implementation. we have already garnered attention and interest from other large archives in the biomedical sciences who wish to enhance their data in similar ways—data that also captures individuals and may expose them to harm if studied without careful procedures in place. lessons learned and future work we recognize that there is a difference between our use of ai and llms, and the possible utilization of these tools enabled by our research and the resulting machine-readable representation of information of the archive. further, we cannot rule out the possibility that someone would use the data for malicious purposes that are presently unforeseeable to us. complementing technical solutions to de-identify individuals, we hence set up a process that integrates into the current regulatory scheme developed by the nih and that restricts the scope of use and limits various kinds of analyses. a possible path is to explicitly state what analyses can be conducted on this corpus, and excluding the rest (e.g. the type of work that is legally and regulatorily allowed to pursue with provided data such as various types of global network analysis), and the information that they are allowed to disclose. documentation our consideration of the ethical aspects of the project have culminated in a submission to northwestern university’s irb prior to engaging in specific research-based analyses. the irb categorized our study as lowrisk, and therefore exempt. the workflow described in the irb application is shown in figure 2. https://doi.org/10.7191/jeslib.811 e811/13 journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 data availability the content and metadata extracted from the archival materials of nhgri are not available to be broadly shared due to the same ethical considerations discussed in this case study. the encoded metadata created from this project and the developed tools and models will be shared with a publication now in preparation. at present, access to the raw documents can be requested directly through the history of genomics program at the national human genome research institute. acknowledgements at northwestern university, luis a. nunes amaral led a university-wide initiative to promote collaboration in data science and artificial intelligence (ai); in parallel, mohammad hosseini and kristi holmes advance ai ethics efforts through the institute for artificial intelligence in medicine (i.aim) at northwestern, and research and training synergies through the galter health sciences library, and the northwestern university clinical and translational sciences (nucats) institute. at nhgri, ai efforts are presently consolidating. as part of this process additional intramural research groups are formed, including one led by christopher donohue. mohammad hosseini and kristi holmes were supported by the national institutes of health’s national center for advancing translational sciences ul1tr001422 and the nih office of data science strategy/ office of the nih director pursuant to ota-21-009, “generalist repository ecosystem initiative (grei)” through other transactions agreement (ota) number ot2db000013. thomas stoeger was funded figure 2: the irb-approved process to handle the nhgri archive data, including the transfer, handling, encoding, and usage of the documents. https://doi.org/10.7191/jeslib.811 https://www.genome.gov/leadership-initiatives/history-of-genomics-program https://www.genome.gov/leadership-initiatives/history-of-genomics-program journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 e811/14 by k99ag068544 and r00ag068544 of the national institutes of health national institute on aging. the funders have not played a role in the design, analysis, decision to publish, or preparation of the manuscript. the research case study was developed as part of an imls-funded responsible ai project, through grant number lg-252307-ols-22. author contributions mohammad hosseini: investigation; project administration; writing – original draft; writing – review & editing spencer hong: conceptualization, data curation, investigation, methodology, software, visualization, writing – original draft, writing – review & editing thomas stoeger: conceptualization; investigation; project administration; supervision; writing – original draft; writing – review & editing kristi holmes: funding acquisition, supervision, writing – review & editing luis a. nunes amaral: funding acquisition, supervision, writing – review & editing christopher donohue: conceptualization, project administration, resources, supervision, writing – original draft, writing – review & editing kris wetterstrand: conceptualization, funding acquisition, project administration. competing interests the authors declare that they have no competing interests. references ahmed, abdullah, adeel abbasi, and carsten eickhoff. 2021. “benchmarking modern named entity recognition techniques for free-text health record deidentification.” amia joint summits on translational science proceedings. amia joint summits on translational science 2021: 102–111. https://pubmed.ncbi.nlm.nih.gov/34457124. balasuriya, dominic, nicky ringland, joel nothman, tara murphy, and james r. curran. 2009. “named entity recognition in wikipedia.” in proceedings of the 2009 workshop on the people’s web meets nlp: collaboratively constructed semantic resources (people’s web), 10–18. suntec, singapore: association for computational linguistics. https://aclanthology.org/w09-3302. barbaro, michael, and tom zeller jr. 2006. “a face is exposed for aol searcher no. 4417749.” the new york times, august 9, 2006, sec. technology. https://www.nytimes.com/2006/08/09/technology/09aol.html. dick, travis, cynthia dwork, michael kearns, terrance liu, aaron roth, giuseppe vietri, and zhiwei steven wu. 2023. “confidence-ranked reconstruction of census microdata from published statistics.” proceedings of the national academy of sciences 120 (8): e2218605120. https://doi.org/10.1073/pnas.2218605120. faundez-zanuy, marcos, julian fierrez, miguel ferrer, moises diaz, ruben tolosana, and réjean plamondon. 2020. “handwriting biometrics: applications and future trends in e-security and e-health.” cognitive computation 12 (september). https://doi.org/10.1007/s12559-020-09755-z. https://doi.org/10.7191/jeslib.811 https://www.lib.montana.edu/responsible-ai/ https://www.imls.gov/grants/awarded/lg-252307-ols-22 https://pubmed.ncbi.nlm.nih.gov/34457124 https://aclanthology.org/w09-3302 https://www.nytimes.com/2006/08/09/technology/09aol.html https://doi.org/10.1073/pnas.2218605120 https://doi.org/10.1007/s12559-020-09755-z e811/15 journal of escience librarianship 13 (1): e811 | https://doi.org/10.7191/jeslib.811 “genome project.” n.d. civilization wiki. accessed march 31, 2023. https://civilization.fandom.com/wiki/genome_project. gonzales, sara, matthew b. carson, and kristi holmes. 2022. “ten simple rules for maximizing the recommendations of the nih data management and sharing plan.” plos computational biology 18 (8): e1010397. https://doi.org/10.1371/journal.pcbi.1010397. hood, leroy, and lee rowen. 2013. “the human genome project: big science transforms biology and medicine.” genome medicine 5 (9): 79. https://doi.org/10.1186/gm483. hosseini, mohammad, michał wieczorek, and bert gordijn. 2022. “ethical issues in social science research employing big data.” science and engineering ethics 28 (3): 29. https://doi.org/10.1007/s11948-022-00380-7. hughes, laura d., ginger tsueng, jack digiovanna, thomas d. horvath, luke v. rasmussen, tor c. savidge, thomas stoeger, et al. 2023. “addressing barriers in fair data practices for biomedical data.” scientific data 10 (1): 98. https://doi.org/10.1038/s41597-023-01969-8. “international hapmap project.” may 01, 2012. genome.gov. accessed november 13, 2023. https://www.genome.gov/10001688/international-hapmap-project. “june 2000 white house event.” august 29, 2012. genome.gov. accessed november 13, 2023. https://www.genome.gov/10001356/june-2000-white-house-event. li, minghao, tengchao lv, jingye chen, lei cui, yijuan lu, dinei florencio, cha zhang, zhoujun li, and furu wei. 2022. “trocr: transformer-based optical character recognition with pre-trained models.” arxiv. https://doi.org/10.48550/arxiv.2109.10282. meystre, stephane m., f. jeffrey friedlin, brett r. south, shuying shen, and matthew h. samore. 2010. “automatic de-identification of textual documents in the electronic health record: a review of recent research.” bmc medical research methodology 10 (1): 70. https://doi.org/10.1186/1471-2288-10-70. narayanan, arvind, and vitaly shmatikov. 2008. “robust de-anonymization of large sparse datasets.” in 2008 ieee symposium on security and privacy (sp 2008), 111–125. https://doi.org/10.1109/sp.2008.33. sang, erik f. tjong kim, and fien de meulder. 2003. “introduction to the conll-2003 shared task: language-independent named entity recognition.” arxiv. https://doi.org/10.48550/arxiv.cs/0306050. shahriari, kyarash, and mana shahriari. 2017. “ieee standard review — ethically aligned design: a vision for prioritizing human wellbeing with artificial intelligence and autonomous systems.” in 2017 ieee canada international humanitarian technology conference (ihtc), 197–201. https://doi.org/10.1109/ihtc.2017.8058187. shumailov, ilia, zakhar shumaylov, yiren zhao, yarin gal, nicolas papernot, and ross anderson. 2023. “model dementia: generated data makes models forget.” arxiv. https://doi.org/10.48550/arxiv.2305.17493. transaccessdm. 2019. “spotlight: nara’s metadata requirements for electronic records.” transaccess (blog). july 8, 2019. https://www.transaccessdm.com/2019/07/spotlight-nara-metadata-requirementsfor-electronic-records. yu, yue, yuchen zhuang, jieyu zhang, yu meng, alexander ratner, ranjay krishna, jiaming shen, and chao zhang. 2023. “large language model as attributed training data generator: a tale of diversity and bias.” arxiv. https://doi.org/10.48550/arxiv.2306.15895. https://doi.org/10.7191/jeslib.811 https://civilization.fandom.com/wiki/genome_project https://doi.org/10.1371/journal.pcbi.1010397 https://doi.org/10.1186/gm483 https://doi.org/10.1007/s11948-022-00380-7 https://doi.org/10.1038/s41597-023-01969-8 https://www.genome.gov/10001688/international-hapmap-project https://www.genome.gov/10001356/june-2000-white-house-event https://doi.org/10.48550/arxiv.2109.10282 https://doi.org/10.1186/1471-2288-10-70 https://doi.org/10.1109/sp.2008.33 https://doi.org/10.48550/arxiv.cs/0306050 https://doi.org/10.1109/ihtc.2017.8058187 https://doi.org/10.48550/arxiv.2305.17493 https://www.transaccessdm.com/2019/07/spotlight-nara-metadata-requirements-for-electronic-records https://www.transaccessdm.com/2019/07/spotlight-nara-metadata-requirements-for-electronic-records https://doi.org/10.48550/arxiv.2306.15895 the implementation of keenious at carnegie mellon university journal of escience librarianship 13 (1): e800 doi: https://doi.org/10.7191/jeslib.800 issn 2161-3974 full-length paper the implementation of keenious at carnegie mellon university joelen pastva, carnegie mellon university, pittsburgh, pa, usa, jpastva@andrew.cmu.edu dom jebbia, carnegie mellon university, pittsburgh, pa, usa maranda reilly, carnegie mellon university, pittsburgh, pa, usa ashley werlinich, carnegie mellon university, pittsburgh, pa, usa abstract in the fall of 2022, the carnegie mellon university (cmu) libraries began investigating keenious— an artificial intelligence (ai)-based article recommender tool—for a possible trial implementation to improve pathways to resource discovery and assist researchers in more effectively searching for relevant research. this process led to numerous discussions within the library regarding the unique nature of aibased tools when compared with traditional library resources, including ethical questions surrounding data privacy, algorithmic transparency, and the impact on the research process. this case study explores these topics and how they were negotiated up to and immediately following cmu’s implementation of keenious in january, 2023, and highlights the need for more frameworks for evaluating ai-based tools in academic settings. received: october 2, 2023 accepted: february 5, 2024 published: march 5, 2024 keywords: keenious, artificial intelligence, ai, libraries, recommender, ethics citation: pastva, joelen, dom jebbia, maranda reilly, and ashley werlinich. 2024. “the implementation of keenious at carnegie mellon university.” journal of escience librarianship 13 (1): e800. https://doi.org/10.7191/jeslib.800. data availability: assessment plan survey questions are available under the article supplementary files. the journal of escience librarianship is a peer-reviewed open access journal. © 2024 the author(s). this is an open-access article distributed under the terms of the creative commons attribution-noncommercial 4.0 international (cc by-nc 4.0), which permits unrestricted use, distribution, and reproduction in any medium non-commercially, provided the original author and source are credited. see https://creativecommons.org/licenses/by-nc/4.0. open access https://doi.org/10.7191/jeslib.800 mailto:jpastva%40andrew.cmu.edu?subject= https://doi.org/10.7191/jeslib.800 https://doi.org/10.7191/jeslib.800 https://creativecommons.org/licenses/by-nc/4.0/ https://orcid.org/0000-0001-9772-5144 https://orcid.org/0000-0002-9587-8718 https://orcid.org/0000-0002-8319-5186 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 e800/2 background as an american research university focused on ai, engineering, and robotics, cmu is uniquely positioned to explore emerging ai tools in the library as well as the laboratory. cmu libraries, like the university itself, prioritizes innovation; as stated in the cmu libraries strategic plan, the central goal of our work is to “create a 21st century library that serves as a cornerstone of world-class research and scholarship” (carnegie mellon university n.d.). in order to enrich the scholarly information ecosystem, our librarians seek out new tools and resources that could potentially change our information seeking networks for the better. as such we have long been interested in improving the resource discovery process, and strive to find better ways to point users to relevant research made available by the library while also reinforcing information literacy best practices. a common problem space for libraries is that database search engines require some baseline knowledge of a topic to find relevant content, which can be challenging for inexperienced researchers who may feel overwhelmed when searches return millions of results. even when content appears to be useful, it often takes a significant investment of time to read portions of articles in order to determine relevance, which can be a daunting process. libraries have also increasingly acknowledged that research begins outside of the library, and any tools that can improve the research process while also pointing back to library resources are highly desirable (frederick and wolff-eisenberg 2020). hoping to address these issues, we were intrigued by keenious, a recommender tool that utilizes search algorithms and ai to analyze input text to suggest relevant academic articles. as an article recommender tool that utilizes text and documents of interest as its starting point, keenious is well positioned to help take the guesswork out of how to initialize searches. we also view keenious as an opportunity to create new pathways to library-subscribed resources because of its integration with the library’s link resolver, improving our ability to link to full text content. additionally, the use of topics in keenious to encourage exploration of related content had the potential to train users on the benefits of controlled vocabularies for improved and reproducible search results. finally, the keenious plugins for microsoft word and google docs offer multiple ways to meet researchers where they are and better integrate with the research and writing process rather than pushing users to an outside tool or website. in preliminary discussions surrounding keenious, our implementation team sought to identify any similar or comparable tools to better frame a needs assessment. word processor plugin features which integrate with research and writing (such as citation management tools refworks and zotero) have been well received, and we felt this keenious feature would prove similarly useful. for article recommendations, the libraries have access via the primo discovery tool to ex libris’s bx, which recommends articles based on link resolver usage data collected across discovery systems, databases, and publisher platforms and claims to be platformand content-neutral (ex libris n.d.). similar to other features offered by content platforms such as scopus, bx 1) requires users to have run a successful search, 2) is a passive feature built into a larger discovery interface, https://doi.org/10.7191/jeslib.800 e800/3 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 and 3) offers no ability to refine recommendations or transparency regarding the recommendation process. usage data from cmu’s primo analytics indicated that bx was not often utilized for content discovery in the primo environment. it was felt that keenious, as a standalone, portable utility with filtering, searching, and citation generation would be more readily adopted by cmu users. although ai technology is increasingly likely to be integrated with modern research tools, some tools more prominently highlight ai as a performance indicator or selling point. cmu libraries subscribes to third iron’s ai-powered libkey suite, which simplifies direct linking to library-subscribed content through searches originating in primo (libkey discovery), library databases (libkey link) and external searches such as google via a browser extension (libkey nomad). based on usage data, libkey has contributed to a noticeable improvement in utilization of library electronic resources. the functionality of libkey as a linking solution is viewed as complementary to resource recommender tools such as keenious, with each addressing a separate but related problem in the research process. a separate team at cmu libraries also intends to trial sciteai, a platform that utilizes deep learning to classify and contextualize article citations. project details we initially learned about keenious in the fall of 2022 through direct vendor contact with the library’s head of resource and discovery services and director of library services. as potentially the first american university library to adopt keenious, we did not have the usual local network of peer institutions to consult with questions or concerns. as the rollout of the general data protection regulation (gdpr) in the eu taught us, the data privacy landscape in the us differs enough from the eu to limit our ability to lean on the experiences of european library adopters of keenious for direct comparison. instead we had to largely develop our own metrics and practices to understand if keenious would be useful to our users, identify the ethical questions associated with implementing the tool, and determine how to assess the tool as it evolves in the future. we reached out to various stakeholders across library functional areas for feedback, including the library’s collection advisory council and discovery access working group before deciding to move forward with a year-long trial starting in january of 2023. we assembled our implementation team to include representation across library departments. our team includes joelen pastva (director of library services), dom jebbia (library associate), maranda reilly (electronic resources manager), and ashley werlinich (liaison to english and drama), and our diverse perspectives will let us more effectively and conscientiously assess the implementation of keenious. a key component to our implementation of keenious was creating awareness about the tool, as well as presenting the tool transparently so users could understand tool functionality and navigation with an informed gaze. before implementation, we developed a solid plan to raise awareness about the tool to our librarians and to the wider university community. this plan included the creation of a keenious libguide, several emails to our library instructor listserv, and an informational session for our library instructors on how to use keenious (pastva 2023). by creating several pathways library instructors could use to seek https://doi.org/10.7191/jeslib.800 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 e800/4 information, we made sure liaison librarians would have a clear understanding of the tool and be able to represent it effectively and comfortably to our library users. in addition, the keenious libguide helped us to represent the tool to faculty, student, and staff users when they did not have an instruction session that incorporated keenious into research demonstrations to guide their understanding. the technical implementation of keenious was managed by the electronic resources manager, and required standard information about cmu-specific email domains, ip ranges, and our link resolver to connect with library content. the director of library services promoted the tool via the library’s social media accounts and website. our liaison to english and drama represented library instructors within our team’s discussions around implementation, and developed an assessment plan alongside our library associate. who is affected by this project? the role of ai in academic libraries and the research data lifecycle has been a topic of discussion among scholars since the 1990s, as evidenced by works such as getz (1991). as society moves further into the 4th industrial revolution, interest in utilizing ai in library services has only continued to grow exponentially. information professionals implementing ai are keenly aware of the practical and ethical challenges that new technologies present (berendt et al. 2023; bubinger and dinneen 2021; cox, pinfield, and rutter 2019). despite these concerns, libraries around the world are deciding that the benefit to users outweighs the potential risks of ai if properly implemented (duncan 2022; ali, naeem, and bhatti 2021; andrews, ward, and yoon 2021; asemi and asemi 2018; panda and chakravarty 2022; r-moreno et al. 2014). although ai has repercussions for everyone in society and academia, the people implementing keenious at cmu libraries were able to identify three groups that would likely be the largest user base: librarians, researchers and faculty, and students. librarians during the implementation of new ai tools in academic libraries, subject liaison librarians and information professionals are a crucial group to consider. although they represent the smallest potential user group, they play a vital role in shaping how the library and its services are perceived by other members of the university community. as the primary point of contact for students, researchers, and academic departments, they form strong relationships that are critical to the success of the library. subject librarians are also responsible for providing instructional assistance and research guides that faculty members in other departments rely on for their own instruction. their understanding of keenious will inform their constituencies’ understanding of the product. thus, it is important to consider the ethical implications of ai on this group since any new features or changes could have a ripple effect throughout the university. https://doi.org/10.7191/jeslib.800 e800/5 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 one ethical consideration for subject liaison librarians is the impact of ai on their job responsibilities. ai tools have the potential to automate certain tasks, which could change the nature of liaison work and require new skills. additionally, ai may introduce biases into the research process or make it more difficult to find and evaluate relevant sources. this can occur in a number of different ways. for instance, the quality and diversity of training data has a major influence on neural network bias, but synthetic datasets and artificial diversity can also degrade the network’s performance. numerous cultural biases are introduced by the predominant usage of english text in the pre-training corpus of many ai research tools. as artificial intelligence continues to advance, and automation becomes more sophisticated, investigators are increasingly relinquishing control to machine agents when it comes to resource discovery and other regular tasks. students often lack the prerequisite knowledge to understand nuances in a new subject, which makes it difficult to recognize biases and inaccuracies when using new tools. therefore, it is important that library research guides and instructional materials actively engage with these new technologies. furthermore, ai could potentially affect the relationships that subject librarians have with their patrons. while ai can identify relevant resources and provide faster access to information, it lacks the human connection and ability to have substantive conversations that librarians bring to the research process. subject librarians can identify creative thinking and perform scaffolded instruction that compliments ai tools, and enhances their services rather than replace them. overall, keenious is a modest implementation of machine learning that creates a new kind of search engine. it aligns well with librarians’ role in resource discovery. the keenious implementation team at cmu researched the product and communicated extensively with the keenious product development team to understand how the product was built, and how best to communicate its features to different audiences. faculty/researchers ai has significant implications for the numerous professional obligations of faculty and researchers, particularly with regards to student instruction and scholarly publication. student instruction is critical to the mission of universities because it trains future scholars who will contribute to the advancement of knowledge and society. likewise, scholarly publication is an important part of securing funding and reappointment for faculty members. ai has the potential to improve the speed and quality of research, allowing researchers to analyze vast amounts of data and make new discoveries. however, the use of ai in research also creates challenges in reproducibility and systemic bias. reproducibility is critical in ensuring that research findings can be verified and validated, which is a cornerstone of scientific inquiry. however, the use of ai can make reproducing results more difficult, as the algorithms used in ai may be complex and difficult to replicate. https://doi.org/10.7191/jeslib.800 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 e800/6 additionally, ai can introduce systemic bias into research, which can have significant implications for the validity and reliability of research findings. for example, if an ai algorithm is trained on data that is biased in some way, it may produce biased results that perpetuate existing inequalities or reinforce stereotypes. when a library is implementing new ai services, it is important to have robust relationships with faculty power users who can provide input on how to integrate ai tools into their teaching and research. faculty members are key drivers of innovation and change in academic institutions, and their support and expertise can be instrumental in ensuring that ai tools are used effectively and ethically. moreover, faculty members play a critical role in propagating skills to the student body, which is why they need to be involved in the implementation and management of new ai services. by working collaboratively with faculty members, libraries can ensure that ai is integrated effectively into the curriculum and research workflows, and that students are prepared for a future where ai will play an increasingly important role in their academic and professional lives. students during the keenious project, students represented the largest potentially impacted user group, as well as the primary source of ethical considerations. keenious, while considered by the authors to be a relatively benign application of ai with semantic technology, has the potential to impact student behavior in significant ways. this is particularly important considering the role that libraries play in teaching research skills to students. one possible way that keenious and other ai tools can impact student behavior is through their ability to encourage different modes of discovery that may be constrained by the biases of the people who develop them. for example, an ai tool may recommend certain sources of information over others, based on preexisting biases in the data used to train the ai model. the designers of ai models may be encouraged to seek out data sources that can lead to shepherding of results based on factors including funding sources, legal jurisdiction, country of origin, and numerous other constraints. this can result in a limited and potentially biased understanding of a particular topic, which can impact the quality of a student’s research. another important consideration is the impact of ai on data privacy. the family educational rights and privacy act (ferpa) requires educational institutions to protect the privacy of student educational records, which includes any data pertaining to a student’s educational record that is collected, stored, or shared. the use of ai in academic libraries has the potential to collect and analyze large amounts of data, which can pose a threat to student privacy if not handled properly. therefore, libraries must secure users’ privacy rights when choosing new tools and services. ethical considerations because of its reliance on ai technology, the decision to implement keenious for a year-long trial went beyond a traditional library resource needs assessment. in the review process prior to implementation, https://doi.org/10.7191/jeslib.800 e800/7 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 discussions raised several new ethical questions surrounding data privacy, transparency, and the impact on the research processes. one primary concern was how keenious collected and utilized data provided by users. in order to assuage these concerns, the cmu team discussed this with the vendor and closely examined the tool’s privacy policies and terms of use. unlike some recommender tools, keenious does not store personal or user-supplied data to improve its recommendations or further train its algorithms. interaction data collected for product optimization and usage metrics is anonymized, and keenious only collects a handful of user data fields for the purpose of account maintenance and authentication that can be easily deleted by a user if desired (keenious 2023). the fact that the company is based in the european union and subject to general data protection regulation (gdpr) as a starting point instead of an afterthought also eased privacy concerns. the common perception of ai tools as “black boxes” which obscure technical processes and data provenance complicated the library’s comfort with endorsing keenious as an emerging tool. this led implementation discussions to focus on various facets of transparency, including the origin of data sources used for keenious recommendations, potential content biases in the data, and algorithmic transparency. cmu learned that keenious uses openalex for its article data, which is an open catalog of scholarly outputs with content from respected sources including orcid, doaj, and pubmed. unlike other recommenders developed by content providers with potential biases toward their own content, openalex was viewed as a neutral, noncommercial data source with a commendable mission to support open source initiatives (openalex n.d.). there is still work to be done to determine whether there are any gaps in subject coverage, which cmu intends to include in future assessment activities. in discussions about algorithmic transparency, the keenious technical team acknowledged the difficulty of sharing details about a complex technical process while balancing the user’s need for insight into the resulting recommendations. a newer feature built into the tool called “ranking information” which shows the score of a recommended article based on shared terms’ predicted meanings is a first step toward greater transparency (see figure 1). the keenious team expressed a genuine interest in putting transparency first when employing ai technology, and also encouraging users to actively engage with results and interrogate findings rather than use the tool to cut corners. this approach feels especially useful to libraries seeking to promote keenious, not as a research shortcut, but as a different way to analyze research outputs. one further concern was raised regarding the long-term roadmap of keenious as a fairly new tool in a highly volatile and evolving information landscape. although the tool satisfied cmu’s initial criteria for user privacy and transparency, when implementing this tool we also needed to consider 1) how new features and integrations might shift the balance of the tool’s emphasis on things like user privacy and transparency and 2) if there is a potential risk of keenious being acquired by a larger entity and potentially losing some of its neutrality. these questions prompted cmu to request more information regarding the long-term roadmap for keenious and include this step in future assessment discussions. although roadmap documents are no https://doi.org/10.7191/jeslib.800 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 e800/8 guarantee of a product’s future, they are excellent resources in understanding the business objectives for newer products. ethical concerns in this realm are a moving target, and any assessment plan for ai-based tools should strive to account for feature expansion over time. ethical considerations: student research behaviors one of the major ethical implications involved with keenious is how the tool might change student approaches to research. as keenious is streamlined to take away the need for the formulation of key search terms, and instead functions on a drag-and-drop or highlight-text approach for ease of searching, it is a real concern that students may change their research habits as a result of using a tool that requires less personal contemplation. additionally, if students are drawn to keenious because of the ease of use, we also need to consider whether they will interrogate the sources that result from a keenious search or just assume these sources are the best for their research. in order to determine whether keenious was detrimental to student research habits prior to its implementation, we asked the following questions: 1. will users use keenious responsibly? what would “irresponsible” use look like? 2. what impact will keenious have on user behavior? the scholarly research process? 3. will students see keenious as a one-stop shop (like many do with google scholar)? 4. will students automatically assume the articles referred are relevant and thus spend less time evaluating sources? 5. will students trust ai more or less than current tools as a recommender of relevant articles? figure 1: ranking information feature in keenious showing article recommendation score https://doi.org/10.7191/jeslib.800 e800/9 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 these initial questions not only informed whether initial adoption of keenious was ethical, but also helped us consider various aspects to watch out for during the implementation of keenious–both in our user assessment stages as well as in our observations as instructors and promoters of this tool. these questions served as touchstones–allowing us to have ethical and user-focused ideas in mind as we approached implementation. while these questions don’t all appear explicitly on the surveys we developed, they did inform survey creation; for example, while we may not ask students outright if they feel that they use keenious responsibly, we created survey questions about where else they research, and whether they thought keenious presented them with relevant articles. in addition, as keenious use grows and we get more research consultations or questions from users related to the tool, these questions will help us shape our informal conversations with keenious users. by having these frameworks in mind, we can guide reference conversations with students using this tool, and ask informed questions about user perceptions of the tool’s relevance, utility, and trustworthiness. in addition to helping us determine questions to ask in our formal and informal assessments, these questions helped us to shape various aspects of our approach to implementing ai-based tools in the future; as it is our hope to create a framework for ethical adoption of ai-based tools within our university, questions like these are indispensable for considering our approach to practical and theoretical concerns around ai tools in the library setting, both now and in the future. ethical considerations: teaching with keenious during the keenious rollout, we recognized the importance of how to represent keenious to our users through both our instruction sessions as well as through remote pedagogy tools like libguides. in order to consider responsible pedagogy with keenious, we focused on the following factors: 1) framing the tool to our instructors, 2) framing the tool to our internal users, and 3) framing the tool to a broader community. one of the most important things when giving instruction on any new tool (and especially tools with the potential to change student research habits) is to make sure we are framing the tool not as a one-stopshop or a superior search tool, but as a component of the larger research ecosystem. for instance, our library instructors have highlighted keenious in various ways, both as a search tool alongside many different databases, as well as a tool for topic/key term generation for early-stage research. but regardless of how our librarians present the tool to users, it’s imperative that we both a) present library instructors with enough information to feel comfortable using the tool in their sessions and b) frame keenious as just one component of the complex research process. additionally, as cmu is the first adopter of keenious in the us, our librarians have inadvertently become representatives of keenious to other librarians curious about implementing the tool in their own libraries. we have had numerous libraries contact our librarians about keenious, and in these interactions our librarians have become instructors not only to our students and faculty, but to other librarians across the https://doi.org/10.7191/jeslib.800 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 e800/10 country. as such, our efforts to frame this tool to our library instructors were crucial. because we cannot be sure which of our librarians will be contacted by other libraries curious about the tool, we need to make sure our librarians have enough information to discuss the tool if necessary at conferences, in reference interactions, or in informal conversations to give peers at other institutions information as questions arise. additionally, this makes it necessary for us to develop future strategies for situations in which we might also be early adopters of technology, and thus consulted by others about the merits and flaws of such technology. ethical considerations: additional concerns when implementing keenious, our team was aware that we not only needed to navigate the ethical concerns of the tool itself, but also needed to consider that users in various departments might have different outlooks on ai tools. as such, we needed to implement the tool knowing that our users would bring their own preconceptions and reservations to the table when approaching both keenious and any future ai tools we attempt to integrate into the library ecosystem. for instance, the english department at cmu—like many other english departments globally—has voiced many concerns over tools like chatgpt, and currently has an ongoing weekly discussion group discussing the ways that chatgpt and other similar tools could potentially change how students and professionals approach writing as such tools become more complex. these concerns are not limited to the english department at cmu, however. in fact, questions about how to handle the new influx of ai tools in the university environment were enough to warrant the creation of an ai tools faq by cmu’s teaching excellence & educational innovation center (2023). although this faq centers chatgpt in the discussion, chatgpt’s association with ai tools more broadly means that our discussions with faculty will not happen independently of the associations people have with ai tools in the academic setting. following this implementation of keenious, we intend to gather more information about our faculty’s specific reservations about ai tools in order to better discuss keenious with them in ways that address their particular concerns or questions. as a university is not a monolith, we must approach each department with new eyes and not assume that the questions and concerns of one department will be identical to those of another department. we must remember when meeting with departments and promoting new tools to be open to feedback, to be inquisitive about faculty and student motivations for using or not using these tools, and facilitate conversations with these groups. assessment as our team implemented keenious, we knew that a key component of analyzing its impact was assessing student, faculty, and staff interactions with the tool; as such, we needed to build in ways to gather information both on how and when users implemented this tool in their own research workflows, as well as information on how this tool potentially changed student and faculty research practices. additionally, we wanted to https://doi.org/10.7191/jeslib.800 e800/11 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 gather information to inform us about whether or not our users consider this tool to be more or less effective than similar resources. as such, in our assessment, we decided to gather the following types of information: 1. introductory questions (questions about how users heard about the tool, how often they use it, and what other tools they use for research) 2. use of the resource (questions about how and when our users implement the tool) 3. likes and dislikes (questions about efficacy of the tool/ obstacles for use) 4. user experience (questions about usability/ accessibility of the tool) 5. future use (questions about intended continued use/ recommendations/ etc.) we developed assessment plans targeting two user groups. first, plans for “internal users” of keenious (see appendix 1). these users include faculty and staff within the library as well as faculty from different departments across the university. although we want a wide variety of internal library users to participate in the internal assessment, instruction librarians and teaching faculty are especially crucial to developing our plans to move forward with keenious. their input on how this tool potentially changes the research process—as well as their input on how effective this tool is in finding articles related to a particular search query—is crucial to determining both the ethics and efficacy of this tool in our university’s research ecosystem. second, we developed assessment plans for “student users” of keenious (see appendix 2). although the questions asked of the student users are somewhat similar to the questions we asked of internal users, the surveys differ slightly in asking students about the types of projects they used keenious for (see appendix 2), and asking internal users if they intend to use the resource in their teaching. this assessment program will launch in the fall semester of 2023 allowing us time to promote the resources with fall classes, and to hopefully accrue some repeat users of keenious so we can get more meaningful feedback. when distributing surveys, we will not be targeting only keenious users—due to privacy concerns in directly contacting users of the tool—but instead we plan to distribute our surveys both through broader listservs (i.e. department-specific, library listservs, etc.) as well as through targeted emails sent to faculty we know have investment and interest in ai conversations more broadly. additionally, we wanted to include a mechanism for feedback in our keenious libguide; as the libguide is one of the direct lines of communication between the library and keenious users, it made sense to have an alternate pathway to submitting feedback there as well. although this survey is shorter than the other two (see appendix 3), having an additional pathway to collecting user responses gives us another potential in-road both for starting conversations with our users and for improving the tool with minimal effort on our end. https://doi.org/10.7191/jeslib.800 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 e800/12 documentation during implementation, our team consulted several policies, best practices, and codes of ethics for additional guidance on ethical considerations relevant to ai-based technologies in libraries. overall, keenious adheres to the recommendations and best practices pertaining to data privacy and algorithmic transparency. the below sources were, and will likely continue to be, valuable to reference for evaluation and implementation of similar recommender tools. • cmu academic integrity policy as the promotion of academic integrity is a core responsibility for the cmu community, the university’s academic integrity policy is integral at the institutional level for our team to ensure all services and tools offered through the libraries meet stated expectations (carnegie mellon university 2020). • odi recommended practice facilitated by the niso open discovery initiative standing committee, the odi recommended practice aims to promote the adoption of conformance statements and to streamline the relationships between discovery service providers, content providers, and libraries. they offer general recommendations, as well as best practices and conformance checklists for each sector. the keenious documentation aligns with niso odi recommendation for discovery services providers to “explain the fundamentals of how metadata is generally utilized within the relevance algorithm (mapping metadata to indexes, weighting of indexes, etc.) and how it enhances discoverability” (open discovery initiative standing committee 2020). • ifla statement on libraries and artificial intelligence the ifla statement on libraries and artificial intelligence additionally provides key ethical considerations for ai technologies in libraries, including privacy, bias, and transparency (ifla committee on freedom of access to information and freedom of expression 2020). for example, ifla notes the importance of libraries to know how vendors train ai systems and tools, and that transparency and explainability can be beneficial to detect and address bias. their framework included the below recommendations for libraries adopting ai tools, which were invaluable in our initial evaluation, procurement, and implementation of keenious: • help patrons develop digital literacies that include an understanding of how ai and algorithms work, and corresponding privacy and ethics questions • ensure that any use of ai technologies in libraries should be subject to clear ethical standards and safeguard the rights of their users • procure technologies that adhere to legal and ethical privacy and accessibility requirements https://doi.org/10.7191/jeslib.800 e800/13 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 • ifla code of ethics for librarians and other information workers the ifla code of ethics for librarians and other information workers was referenced (ifla 2012), which embodies the ethical responsibilities within the library profession. this code of ethics serves as a set of guiding principles for providing information service in modern society, with an emphasis on social responsibility. lessons learned and future work although still in its early phases, the implementation of keenious at cmu has been an eye-opening experience that has resulted in a number of takeaways and ideas for future investigation. we quickly realized that, while ai tools can in many ways be approached from an area of need similar to other library resources, the nature of the underlying technology requires additional considerations to best determine what is appropriate for library adoption. these considerations are ethically fraught, as they touch on sensitive issues including privacy, transparency, and the perceived impact on research behaviors and pedagogy. we have only scratched the surface with our ethical discussions at cmu, and much work remains to ensure that we engage with our campus population in evaluating the long-term impact tools such as keenious have on research behavior. we must also carefully structure our assessment strategy to track changes that may be ethically concerning, such as new features or data collection activities. this case study has also highlighted the need for new frameworks for evaluating the complete lifecycle of ai-based tools, from acquisition to implementation to ongoing assessment. much as the niso open discovery initiative grew from the need for best practices and standards surrounding index-based discovery services, the unique nature of tools centered around ai technologies requires new standards for carefully examining product features and vendor policies prior to and following implementation. it is also important to engage with product vendors as much as possible when ethical questions about their tools are raised. librarians have long been experts in advocating for user privacy, and should engage with vendors in conversations about what privacy looks like in today’s data-driven landscape. we had very positive experiences working with keenious, as they were quick to answer questions, provide supporting documentation, and connect us with their technical teams to better understand their product. they also organized a workshop for librarians on generative ai in the spring of 2023 to gather more feedback on the ethical implications of ai-powered tools, and to guide future development of their product. they clearly understood our need to provide ethically sound tools for our campus community, and demonstrated a genuine interest in developing their technology responsibly. ultimately librarians can help to define what effective transparency means in the research and information landscape to ensure that users can engage critically with new technologies. we must acknowledge that disruptive change from ai tools has already arrived, and libraries should be proactive in preparing themselves for whatever ethical challenges lay ahead. https://doi.org/10.7191/jeslib.800 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 e800/14 data availability assessment plan survey questions are available under the article supplementary files: appendix 1: survey for internal assessment appendix 2: survey for student users of keenious appendix 3: libguide survey acknowledgements the research case study was developed as part of an imls-funded responsible ai project, through grant number lg-252307-ols-22. competing interests the authors declare that they have no competing interests. references ali, muhammad yousuf, salman bin naeem, and rubina bhatti. 2021. “artificial intelligence (ai) in pakistani university library services.” library hi tech news 38 (8): 12–15. https://doi.org/10.1108/lhtn-10-2021-0065. andrews, james e., heather ward, and jungwon yoon. 2021. “utaut as a model for understanding intention to adopt ai and related technologies among librarians.” the journal of academic librarianship 47 (6): 102437. https://doi.org/10.1016/j.acalib.2021.102437. asemi, asefeh, and adeleh asemi. 2018. “artificial intelligence(ai) application in library systems in iran: a taxonomy study.” library philosophy and practice 1840. https://digitalcommons.unl.edu/libphilprac/1840. berendt, bettina, özgür karadeniz, sercan kıyak, stefan mertens, and leen d’haenens. 2023. “bias, diversity, and challenges to fairness in classification and automated text analysis. from libraries to ai and back.” arxiv. https://doi.org/10.48550/arxiv.2303.07207. bubinger, helen, and jesse david dinneen. 2021. “actionable approaches to promote ethical ai in libraries.” proceedings of the association for information science and technology 58 (1): 682–684. https://doi.org/10.1002/pra2.528. carnegie mellon university. n.d. “strategic plan 2025.” accessed march 31, 2023. https://www.cmu.edu/strategic-plan/strategic-recommendations/21st-century-library.html. carnegie mellon university. 2020. “carnegie mellon university policy on academic integrity.” university policies. accessed march 27, 2023. https://www.cmu.edu/policies/student-and-student-life/academic-integrity.html. cox, andrew m., stephen pinfield, and sophie rutter. 2019. “the intelligent library: thought leaders’ views on the likely impact of artificial intelligence on academic libraries.” library hi tech 37 (3): 418–435. https://doi.org/10.1108/lht-08-2018-0105. https://doi.org/10.7191/jeslib.800 https://doi.org/10.7191/jeslib.800 https://www.lib.montana.edu/responsible-ai/ https://www.imls.gov/grants/awarded/lg-252307-ols-22 https://doi.org/10.1108/lhtn-10-2021-0065 https://doi.org/10.1016/j.acalib.2021.102437 https://digitalcommons.unl.edu/libphilprac/1840 https://doi.org/10.48550/arxiv.2303.07207 https://doi.org/10.1002/pra2.528 https://www.cmu.edu/strategic-plan/strategic-recommendations/21st-century-library.html https://www.cmu.edu/policies/student-and-student-life/academic-integrity.html https://doi.org/10.1108/lht-08-2018-0105 e800/15 journal of escience librarianship 13 (1): e800 | https://doi.org/10.7191/jeslib.800 duncan, adrian st. patrick. 2022. “the intelligent academic library: review of ai projects & potential for caribbean libraries.” library hi tech news 39 (5): 12–15. https://doi.org/10.1108/lhtn-01-2022-0014. eberly center for teaching excellence & education innovation. 2023. “ai tools (chatgpt) faq.” accessed march 30, 2023. https://www.cmu.edu/teaching/technology/aitools/index.html. ex libris. n.d. “bx recommender.” accessed march 29, 2023. https://exlibrisgroup.com/products/bx-recommender. frederick, jennifer, and christine wolff-eisenberg. 2020. “ithaka s+r us library survey 2019.” ithaka s+r. https://doi.org/10.18665/sr.312977. getz, ronald j. 1991. “the medical library of the future.” american libraries 22 (4): 340–343. http://www.jstor.org/stable/25632204. ifla. 2012. “ifla code of ethics for librarians and other information workers (full version).” august 2012. https://www.ifla.org/publications/ifla-code-of-ethics-for-librarians-and-other-informationworkers-full-version. ifla committee on freedom of access to information and freedom of expression. 2020. “ifla statement on libraries and artificial intelligence.” october. https://repository.ifla.org/handle/123456789/1646. keenious. 2023. “how keenious recommends research articles.” keenious knowledgebase. accessed february 27, 2023. https://help.keenious.com/article/54-how-keenious-recommends-research-articles. mccorduck, pamela, and cli cfe. 2004. machines who think: a personal inquiry into the history and prospects of artificial intelligence (2nd ed.). a k peters/crc press. https://doi.org/10.1201/9780429258985. openalex. n.d. “about.” openalex. accessed march 27, 2023. https://openalex.org/about. open discovery initiative standing committee. 2020. “niso rp-19-2020, open discovery initiative: promoting transparency in discovery.” niso. https://doi.org/10.3789/niso-rp-19-2020. panda, subhajit, and rupak chakravarty. 2022. “adapting intelligent information services in libraries: a case of smart ai chatbots.” library hi tech news 39 (1): 12–15. https://doi.org/10.1108/lhtn-11-2021-0081. pastva, joelen. 2023. “keenious: introduction.” libguide. https://guides.library.cmu.edu/c.php?g=1293235&p=9497310. r-moreno, maría d., bonifacio castaño, david f. barrero, and agustín m. hellín. 2014. “efficient services management in libraries using ai and wireless techniques.” expert systems with applications 41 (17): 7904–7913. https://doi.org/10.1016/j.eswa.2014.06.047. https://doi.org/10.7191/jeslib.800 https://doi.org/10.1108/lhtn-01-2022-0014 https://www.cmu.edu/teaching/technology/aitools/index.html https://exlibrisgroup.com/products/bx-recommender https://doi.org/10.18665/sr.312977 http://www.jstor.org/stable/25632204 https://www.ifla.org/publications/ifla-code-of-ethics-for-librarians-and-other-information-workers-full-version/ https://www.ifla.org/publications/ifla-code-of-ethics-for-librarians-and-other-information-workers-full-version/ https://repository.ifla.org/handle/123456789/1646 https://help.keenious.com/article/54-how-keenious-recommends-research-articles https://doi.org/10.1201/9780429258985 https://openalex.org/about https://doi.org/10.3789/niso-rp-19-2020 https://doi.org/10.1108/lhtn-11-2021-0081 https://guides.library.cmu.edu/c.php?g=1293235&p=9497310 https://doi.org/10.1016/j.eswa.2014.06.047 ethical considerations in integrating ai in research consultations: assessing the possibilities and limits of gpt-based chatbots journal of escience librarianship 13 (1): e846 doi: https://doi.org/10.7191/jeslib.846 issn 2161-3974 full-length paper ethical considerations in integrating ai in research consultations: assessing the possibilities and limits of gpt-based chatbots yali feng, university of illinois at urbana-champaign, urbana, il, usa, yalifeng@illinois.edu jun wang, independent researcher, syracuse, ny, usa steven g. anderson, university of illinois at urbana-champaign, urbana, il, usa abstract objective: this case study sought to provide early information on the accuracy and relevance of selected gpt-based product responses to basic information queries, such as might be asked in librarian research consultations. we intended to identify positive possibilities, limitations, and ethical issues associated with using these tools in research consultations and teaching. methods: a case simulation examined the responses of gpt-based products to a basic set of questions on a topic relevant to social work students. the four chatbots (chatgpt-3.5, chatgpt-4, bard, and perplexity) were given identical question prompts, and responses were assessed for relevance and accuracy. the simulation was supplemented by reviewing actual user exchanges with chatgpt-3.5 using a sharegpt file containing conversations with early users. received: november 15, 2023 accepted: february 5, 2024 published: march 6, 2024 keywords: gpt-based chatbot, chatgpt, sharegpt, knowledge practice, disposition, socratic questioning, research consultation, artificial intelligence, ai citation: feng, yali, jun wang, and steven g. anderson. 2024. “ethical considerations in integrating ai in research consultations: assessing the possibilities and limits of gpt-based chatbots.” journal of escience librarianship 13 (1): e846. https://doi.org/10.7191/jeslib.846. the journal of escience librarianship is a peer-reviewed open access journal. © 2024 the author(s). this is an open-access article distributed under the terms of the creative commons attribution 4.0 international license (cc-by 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. see https://creativecommons.org/licenses/by/4.0. open access https://doi.org/10.7191/jeslib.846 mailto:yalifeng%40illinois.edu?subject= https://doi.org/10.7191/jeslib.846 https://creativecommons.org/licenses/by/4.0/ https://orcid.org/0000-0003-1066-9659 https://orcid.org/0009-0003-9439-4599 https://orcid.org/0009-0003-9590-6250 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 e846/2 abstract continued results: each product provided relevant information to queries, but the nature and quality of information and the formatting sophistication varied substantially. there were troubling accuracy issues with some responses, including inaccurate or non-existent references. the only paid product examined (chatgpt-4), generally provided the highest quality information, which raises equitable access to quality technology concerns. examination of sharegpt conversations also raised issues regarding ethical use of chatbots to complete course assignments, dissertation designs, and other research products. conclusions: we conclude that these new tools offer significant potential to enhance learning if well-employed. however, their use is fraught with ethical challenges. librarians must work closely with instructors, patrons, and administrators to assure that the potential is realized while ethical values are safeguarded. summary this project explored some preliminary issues in using gpt-based chatbot applications as alternatives to, accessories, or context in conducting research consultations (rc), which are among the most important services provided in academic libraries. we developed a case example intended to simulate chatbot use by relatively novice users in searching for information on very basic research and education related questions, such as might be asked by a student or other patron who is not an expert either on the topic or on chatbot use. we focus on issues such as the ease of obtaining relevant information, its accuracy, and related ethical considerations in its use. based on our findings, as well as the extremely rapid emergence of enhanced ai applications, we discuss ethical considerations facing librarians and their patrons as they use chatbots in developing research products and in providing related library rc. project details the case study explores the capabilities of and issues associated with using different gpt-based chatbots in a research consultation (rc) context. we define rcs as schedule based, one-to-one individualized, personalized research services provided by subject librarians for students and researchers in academic libraries. rc may be viewed as the intercept between reference interviews and instruction; they sometimes are considered as one kind of reference interview, as well as information literacy instruction (association of college & research libraries 2011). while we are particularly interested in rc, our exploration focuses on unguided interactions between people and chatbots. such interactions can offer important guidance in constructing rcs and in considering ethical chatbot use more generally. https://doi.org/10.7191/jeslib.846 e846/3 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 roles, services, and infrastructures the project involved a collaboration between a university behavioral sciences librarian, a senior professor in a field served by that librarian (social work), and an independent expert with extensive experience using large data sets and artificial intelligence (ai). the librarian and professor defined research consultation scenarios likely to occur as the librarian served social work faculty and students, and they also led the case simulation construction and analysis that was the centerpiece of the project. the ai expert led the review and selection of gpt-based products to be used, and also provided technical guidance on chatbot inquiries and related interpretations. in addition, he accessed sharegpt data that served as important supplemental information to our case simulation. we did not have to draw upon any formal university services or collections to execute this case study, although the librarian did review a historical record of research consultations she compiled to gain a basic understanding of the substance and range of rc in the social science areas she serves. beyond our own efforts, the project primarily required access to leading gpt-based products, all of which are easily available both inside and outside of university libraries. similarly, no special computing software or capabilities were required beyond those typically available on office computers. in selecting the gpt-based products to use, our technical expert sought to identify popular but diverse products to allow potentially useful comparisons. in addition, to examine possible differences associated with free versus paid products, we included chatgpt-4 as a paid product along with the free products we used. chatgpt-4 required a subscription fee of $20 per month when the study was conducted. process and methods we developed the case study simulation using a social work/social policy scenario through which we could observe and reflect upon chatbot use ethical issues. we included a comparison of selected gpt-based chatbot products: chatgpt-3.5, chatgpt-4, perplexity ai, and google bard. our intent was to explore how rapidly improving ai tools might perform differently and interact with corresponding ethical issues. we especially valued taking the user’s growth into consideration when reflecting on ethical issues, and also observed that ethical considerations will vary with the users’ expertise level and use purpose. it is not a static standard, but rather dynamic and interactive. we referred to the acrl framework for information literacy for higher education for guidance on information literacy, which it defines as “the set of integrated abilities encompassing the reflective discovery of information, the understanding of how information is produced and valued, and the use of information in creating new knowledge and participating ethically in communities of learning” (association of college & research libraries 2016). an initial consideration was how we could simulate search on a basic question to facilitate our preliminary exploration of chatbot use. we decided on a strategy of having our subject area expert construct a few basic research-related questions that an undergraduate student from social work or a related field might be asked to pursue in gathering information for a term paper or other class project. after the team reviewed https://doi.org/10.7191/jeslib.846 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 e846/4 this initial list, we selected a question related to social work and technology on which to simulate a search. it is notable that social work students typically receive little educational training on this issue, and yet “harnessing technology for social good” has been identified as one of the 13 grand challenges of social work (singer et al. 2022, 230). as such, we viewed this as a question of importance in the social work field, but one in which an undergraduate student typically would have limited background and hence would likely be a novice in terms of domain knowledge. in addition, we assumed that the user would be a relative novice in terms of experience in using ai, given that our construction occurred in the early stages of gpt-based chatbot use. hence, in our simulation we chose to limit the sophistication and scope of prompting. we recognize that this conceptualization of “novice” is crude, and it represents the least developed or “beginning” endpoint on a continuum of domain and technical competencies useful in addressing the question asked. yet, this is an important group to consider as the use of gpt-based chatbot educational and research inquiry ramps up—and one that will be of particular importance to librarians and instructors. to guide our search, we formulated our question in this way: “technology is likely to have varying impacts on different groups in society. what are some of the overarching concerns with how the rapid development of technology may have negative effects on poor people and other disadvantaged groups?” the ai expert and subject librarian then simulated a student search of this topic. we developed a series of prompts to simulate a user’s exploration process and data collection efforts, first starting with the broad research question, and then drilling down into specific areas. for example, after the broad research question was presented to the chatbots directly, the chatbots were asked follow-up questions about the “digital divide” and other aspects related to more fully exploring the basic question. these additional prompts can be viewed as corresponding to how one with slightly more subject matter expertise could refine and enhance a basic search. we asked identical questions for each of the four ai products selected. for each question, we collected the output generated for subsequent analysis. we then evaluated selected aspects related to the accuracy and relevance of the answers, and compared them among chatgpt-3.5, chatgpt-4, google bard, and perplexity ai. this included review of the output from each chatbot by team members, as well as related follow-up checking to verify the accuracy of response information. in assessing accuracy, we focused on determining if the chatbot output contained notable factual inaccuracies that would be obvious to a subject matter expert. detailed reference checking then served to verify whether the sources provided in chatbot responses were accurate. with respect to relevance, we focused on the substantive quality of the output, how closely the response focused on the question, and how well response components were integrated. we ran these tests during march-may 2023, and fully recognize the point-in-time nature of these searches and that capabilities of these tools are changing very rapidly. https://doi.org/10.7191/jeslib.846 e846/5 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 results and supplemental methods all of these models were able to generate an overview with relevant information in response to our initial broad research question. however, the responses differed in important ways. first, the range of information varied significantly. chatgtp-3.5, bard, and perplexity each generated five different areas of concern related to technological development, while gpt-4 provided 10. second, while some overlapping types of concern were provided by the different chatbots, the issues selected also varied significantly. only the well-known “digital divide” was included as a category of response for each of the chatbots. finally, the chatbots exhibited a noticeable differentiation of focus in their responses. gpt-3.5 and gpt-4 both provided responses that focused specifically and consistently on concerns about poor and disadvantaged groups, while some bard and perplexity responses drifted into technology concerns of importance to broader populations (i.e., fake news, sedentary lifestyle, cyberbullying). perplexity differed from the other three chatbots in that it offered links to related citations, which made it easy to take a first step in exploring relevant information in more depth. in considering more detailed follow-up questions, we decided to focus on the digital divide, given that it was the one topic on which each chatbot provided initial responses and it also is among the most fundamental technology issues facing poor people. in particular, we asked the chatbots: “can you describe the most important research and researchers in the ‘digital divide’ research area?” again, the chatbots provided a great deal of relevant information, but the organization and quality varied substantially. perhaps consistent with the dual focus of the question on “most important research” and “researchers,” the data were organized differently by each chatbot and sometimes focused on one aspect (i.e., research versus researchers) more than the other. chatgpt-4 again stood out in terms of being wellorganized, by providing an initial description of key related research areas and then identifying some wellknown scholars with brief but substantive information on their contributions. bard also provided a useful basic summary of important digital divide related research areas, but its list of researchers was minimal, and it did not provide much information on their contributions. chatgpt-3.5 provided some useful information on digital divide researchers but no integrated discussion of important research areas. perplexity likewise offered no overview summary or integration of topics, but provided many subcategories of relevant research with brief summaries and again provided links to related citations. we then asked a follow-up question on one of the prominent researchers mentioned in responses to the previous question: “what are the most important research publications written by jan van dijk on digital divide?” all four chatbots responded to the question and listed works purportedly written by van dijk. as might be expected, there were variations in the books and articles provided. gpt-3.5 and gpt-4 appeared to provide the highest quality information, with gpt-4 again providing more detail and also presenting the https://doi.org/10.7191/jeslib.846 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 e846/6 information in a well-organized and easy to use format. in contrast, perplexity provided relatively little information, while bard included a short summary of van dijk along with four purported publications by him. another important concern emerged when we engaged in systematic fact-checking on the publications provided by each chatbot for van dijk. the bard information proved most troubling, in that all four of the publications provided could not be verified as real. chatgpt-3.5 also had one non-verifiable publication among the five it provided, and both chatgpt-3.5 and chatgpt-4 provided other publications with partially incorrect information, such as publication dates or collaborators. in contrast, consistent with follow-ups in earlier questions on its linked sources, the perplexity citation information was consistently accurate. in summary, all of these large language models (llm) provided a reasonable introductory description on a well-known research topic, in terms both of the accuracy of information provided and the relevance of this information to the broader issue being queried. however, even these introductions varied substantially, and the llms generated divergent answers on specific questions that required additional knowledge. in addition, the credibility of the information sources provided varied greatly, and resulted in major information accuracy and related quality issues. the llms also varied widely in their organization and integration of materials provided, ranging from highly organized and well-integrated responses such as provided by chatgpt-4 to more casual summaries with some drift of focus from the questions asked. april 2023 data set from sharegpt while the case simulation was our principal source of information for this project, we also decided to examine actual user case reports to gain a preliminary sense of selected ways that chatbots are being used in pursuit of research-related information. we did so by accessing user case reports available from sharegpt, which is a chrome extension that allows users to share their conversations with chatgpt-3.5 by generating a url that a user can share to social media or other internet sources. the shared urls were originally open for public access, which later was discontinued after accusations that google was using this data set to help fine-tune their own model bard. we were able to download 90k conversations that had entered the public domain before the access was closed. we identified 53 conversations related to research consultation from this data set by using “research topic” and “research question” as search terms. in this sense, the cases diverge from our simulation exercise focusing on novice users, and many also fall outside the normal boundaries of library rc. nonetheless, these cases provide useful broader exposure to a diverse and rich range of early uses, and thus can enhance our consideration of ethical issues. we read the output from these cases to identify selected ways in which actual users were interacting with chatgpt, as well as to ascertain some related ethical concerns. we emphasize that this exercise was purely exploratory and is not intended to quantify aspects like most common uses or the adequacy of responses. each output case included a verbatim “conversation” in which the “human” provides some background https://doi.org/10.7191/jeslib.846 e846/7 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 information and asks questions, and then chatgpt responds. many cases include fairly long interactions, in which the person seeking information continues to refine and expand questions based on the information that chatgpt provides. a few summary points from the sharegpt cases merit attention. first, the range of questions and topics is striking, including but not limited to queries for help with short undergraduate papers or class project proposals; thesis writing; article background research, research question and hypothesis development, and writing; and what appear to be more formal funding proposals. it was not possible to review and verify the vast and diverse quantity of chatgpt output provided through these cases, but in general the responsiveness and interactions appeared impressive. second, similar to our earlier characterization of novice and more advanced users, these cases demonstrated wide ranges of sophistication among those asking questions. in more advanced conversations, the questioner was able to skillfully review the chatgpt response and continue asking follow-ups until a desired product or set of answers was produced. some differences in the types of information being sought likewise are useful in considering possible ethical boundaries or instructor guidance in the acceptable use of chatbots for classes or research projects. for example, some of the queries either were seeking initial help with how to do something, such as requesting the basic steps in writing a research report. others were using chatgpt much like an editor or advanced form of clerical support. uses of this nature included taking a set of materials provided and turning it into a paragraph form; pulling bullet points or power point slides from written material provided; and translating from one language to another. even some of these basic uses have ethical implications, in that it was not possible to determine if the initial source materials presented to the chatgpt were actually written by the person presenting them. other conversations are more challenging from an ethical standpoint and suggest the need for rich discussion about acceptable use guidelines in various academic venues. first, many queries asked chatgpt to develop research questions and hypotheses related to a particular topic. one could argue that this often occurs when students and researchers review existing research, and as such is just a more efficient way to identify possibilities. however, this identification stage often is considered fundamental to critical thinking, so simply having a single prompt resulting in suggested research questions and hypotheses raises interesting questions. a related concern is that chatgpt generally did not provide citations for such research advice, so there are important questions concerning the quality of information provided or the related crediting of ideas. second, it was clear that many of the inquiries involved using chatgpt to formulate or actually write large portions of papers. in some cases, this was done with a single blanket prompt (i.e., write me a project proposal on a given subject). in others, it involved a long dialogue in which the initial query asked for fairly general ideas (such as possible paper outline and sections), and then follow-ups requested more detailed drafting. https://doi.org/10.7191/jeslib.846 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 e846/8 the sharegpt data set is useful for studying actual prompting to gain understanding of the human side of human-chatbot interactions, as well as for analyzing prompt behavior to gain insights into the behavior of the chatbots. yet, we must recognize that presenting prompts to obtain apparently ethically challenging output does not mean the output in turn would actually be used unethically. the actual use of such output in educational and research product development is a complex issue requiring further study. overall, the ability to capture and review actual conversations like these suggests an interesting area for further research. these case interactions point to the rich and diverse amount of information that chatgpt is able to produce when provided with specific questions. at the same time, the more detailed back and forth between the person seeking information and chatgpt suggests specific areas of inquiry in which ethical use guidance may be especially important. finally, gaining a better sense of the quality of response, and how this may compare to alternative means of information access and research consultation, is a very complex subject requiring additional research. background the nature of our project did not involve implementing specific ai tools either in the library or in the collaborating social work school. rather, our intent was to learn how ai products readily available to library patrons are likely to be used, and what some of the practical and ethical impacts of such use will be on rc and teaching practices. as such, we view the benefits of our project as providing some early findings and related guidance on ethical issues that are evolving very rapidly. this obviously is an area in which discussions are exploding not only in libraries but across all academic units, and universities and other entities are scrambling to develop responsible policies, guidelines, and resources to guide both research and teaching practices. we intend for our project to be one of many responding to major knowledge gaps related to the use of these transformational tools. the intent is to provide evidence-based findings that raise important ethical questions related to developing effective practice guidelines. with respect to other tools and guidance, we particularly were informed by the framework for information literacy for higher education (association of college & research libraries 2016). it provides a broad framework that should be useful in framing ongoing thinking about information use in inquiry, even as ai and other information tools fundamentally change information seeking and use strategies. the ability to look at real inquiries through the sharegpt files also was very useful in beginning to understand the vast range of inquiries, sophistication of prompting, and possible uses and misuses of these new capabilities. ethical considerations because our study developed and executed a case simulation involving only our project team, we focused on identifying and assessing selected ethical challenges that our results suggest will be important in shaping university library and teaching policies and practices. we consulted several references in considering such ethical issues broadly, including the ala code of ethics, ifla code of ethics for librarians and other https://doi.org/10.7191/jeslib.846 e846/9 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 information workers, ifla statement on libraries and artificial intelligence, the national association of social workers code of ethics, and the council on social work education 2022 educational policy and accreditation standards. our overall reflection from this project is that the general stance and/or attitude one takes toward ai technology powerfully affects which ethical issues will receive greatest priority. on the one hand, there are legitimate concerns regarding how this new technology may be misused, which may lead to a primary “resistance attitude” that emphasizes ethical considerations related to preventing harm. however, this approach may be overly restrictive, and could foster a narrow focus that does not capture the rich potential of these new applications. in contrast, an “embracing attitude” is more reluctant to limit issue considerations to merely reducing potential harms, and instead encourages thoughtful exploration of possible benefits if appropriate accompanying guidance and guardrails can be developed. ethical considerations in such a framework appear more dynamic, multidimensional, and holistic, and at this point are largely unresolved and will require ongoing thoughtful discourse as ai applications and use continue to develop. as we consider in the next section who will be affected in university educational settings by these emerging chatbot developments, we try to balance concerns about harm with more positive possibilities. within this broader context, our case simulation findings suggest several more specific ethical implications. first, the findings are particularly problematic in terms of novice users, because such users are unlikely to have the prior domain knowledge helpful in distinguishing between true vs. incorrect information generated by llms. the mixture of accurate and inaccurate information found in our simulation, all of which was presented in fluent human language, is extremely challenging to untangle for novice users and even potentially difficult for more advanced ones. to be effective, users thus need to build strong factchecking and interpretive skills when collecting and using llm-generated information. a second related issue extends beyond assessing basic data accuracy. we found considerable variation in the nature and specificity of information provided by different chatbots, even when information was accurate. as such, the quality of learning for users is likely to be affected significantly by which chatbot they happen to employ. this suggests the need for strong guidance from instructors, librarians, or others regarding the quality of different chatbot products as well as on how best to formulate search questions. a final emerging ethical issue pertains to equity in access to various chatbots. in our case study, the paid chatgpt-4 model appeared to be more advanced in the focus, integration of ideas, and organization of output than the free alternatives we tested. this suggests that users with fewer resources may not only get less information, but lower quality information. the commercialization of chatbot applications in the future is likely to amplify these early findings of qualitative differences in free versus paid products. this inequality issue merits attention as universities develop guidance for chatbot use. it likewise raises another potentially troubling version of the digital divide with respect to access to higher quality chatbots. https://doi.org/10.7191/jeslib.846 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 e846/10 who is affected by this project? the use and impact of gpt-based chatbots in research, writing and learning by university students, faculty members, and other library patrons is unfolding in many unpredictable ways and will have yet unknown influences on library practice. based on this exploratory research, it is clear that ongoing chatbot developments will present significant opportunities and challenges for teaching and research, and in turn for rc and other related library services. such developments will occur in the context of a whole university eco-system that includes but is not limited to students, instructors, librarians, and administrators engaged in setting instructional and technology use policies. the following is a brief initial assessment of selected challenges and opportunities that our study suggests will be relevant for each of these key stakeholders. students it is useful to start with students, as they are the key patrons who other stakeholders are attempting to serve in empowering and ethical ways. our case illustrates just how challenging guiding the use of these new technologies is likely to be even when considering the most well-meaning and dedicated students. one important challenge is assuring that students ask chatbots questions that are most likely to result in relevant and accurate information. doing so requires sufficient domain knowledge to formulate initial questions well, and then to follow-up with thoughtful prompts based on initial responses. but even if students execute these functions reasonably well, we cannot assure that chatbots will provide relevant and accurate information. this points to the need for careful scrutiny and ongoing checking by students of the chatbot information they generate. this is true of any search for new information. yet it is amplified in unguided chatbot searches in which consumers are unsophisticated about issues such as how the chatbot is retrieving and assembling information, as well as about the accuracy and ethical use of the data retrieved. it is elevated in situations in which students are investigating a topic on which they have little background, because the lack of domain knowledge compromises one’s ability to assess the accuracy and relevance of the information retrieved. despite these challenges, well-executed student chatbot use will present important opportunities for enhancing learning. it provides the promise of quickly generating a reasonable set of basic information, and of providing both argumentation and references that can aid thinking about next steps in investigating a topic of interest. this can be the first step in an iterative process in which students conduct deeper chatbot searches building on the initial information produced. it can allow students to quickly obtain initial information on diverse topics, and then to focus at later stages on more integrative, application-oriented, and creative thinking. instructors instructors face even more complex challenges and opportunities. like others, most instructors will experience a steep learning curve in developing chatbot use expertise, and often may be no more advanced https://doi.org/10.7191/jeslib.846 e846/11 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 than the average student they teach. thus, they must navigate many of the same challenges as students in learning how to use chatbots in ways that are productive and uncompromising of core ethical and quality values. but their challenges extend much further, particularly in their central role in guiding and assessing student learning. such issues include but are not limited to clarifying the extent to which chatbot use is acceptable for completing papers or other assignments; establishing how direct use of chatbot output should be quoted or otherwise cited; testing the accuracy of citations used in chatbot output; and developing familiarity concerning which gpt-based chatbots are most likely to meet performance standards.instructors also are in the unenviable position of devising and enforcing strategies to assure that students use chatbots ethically in completing their assignments. again, however, skilled and well-trained instructors can use chatbots to enhance student learning opportunities as well as to assess student learning. for example, instructors can use chatbots to rapidly scan for basic information on a wide array of topics. chatbots also may become a relatively easy way to check on the range and depth of information students access when completing assignments. they likewise may be used to explore information on teaching techniques and accompanying materials. as chatbots improve and instructor expertise develops, it may be that chatbots in some ways become to composition what computers are to calculation. in particular, students might be encouraged to use chatbots for basic learning searches and related compositions, thus not only instructing them on next wave information acquisition but freeing time for higher level critical and conceptual thinking. just as instructors have long been trained on how to enhance student learning through effective questioning in classroom settings, they will be challenged on how to train students on effective question development for use with chatbots. librarians academic librarians will be key stakeholders for developing chatbot literacy strategies that help students and faculty members understand both the pitfalls and potential of chatbot access. the skill with which they develop and continually refine expertise will be critical to the relative success of chatbot use within institutions. librarians will not only be influential in how they engage with students in rcs involving chatbot use. perhaps more fundamentally, they can be critical partners in working with instructors to think through information gathering and use strategies that maximize the capabilities of chatbot use in student learning while safeguarding ethical standards and stimulating higher-level critical thinking. librarians likewise will need to work closely with faculty members and other university colleagues to determine and then disseminate information on the skills and capabilities essential to enhancing effective chatbot use. this role will include but not be limited to raising awareness of information accuracy issues, cultivating critical thinking in terms of relevance, and developing skills for translating effective questioning into productive chatbot prompts. librarians also can work with faculty members to develop notes and guidelines on chatbot use for their courses, such as use purpose, effective prompt examples, and expectations related to crediting chatbot contributions appropriately. https://doi.org/10.7191/jeslib.846 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 e846/12 librarians will require training so they can integrate tools that are widely used by patrons into their skill repertoire, knowledge practice, and daily services, including a focus on cultivating critical thinking through question development. the prerequisite is to understand and identify some related concepts and corresponding ethical considerations, such as traditional question negotiation and questioning versus chatbot related prompt intention, prompt behavior, and output use. such differences and the interaction between them need to be examined carefully so librarians can grasp the context and focus when they conduct rc. question negotiation is an important part of rc through which the patron and librarian introduce the research topic and clarify the research question, which sets the foundation for developing effective search strategies. further investigation is needed on how this process differs for interactions between users and chatbots, but here we present some initial observations. employing sound strategies of effective question design (eqd) should be an important focus in cultivating critical thinkers in the gpt-based chatbot era, as critical thinking and question development are interrelated and mutually reinforcing. established questioning frameworks can serve as the threads that connect domain knowledge, language awareness, information literacy, and ethical considerations, which can enhance the development of critical thinking. gpt-based products can be effective tools during this process. developing thought-provoking, well-structured questions encourages individuals to engage in deeper analysis, evaluate different perspectives, and synthesize information from various sources. to facilitate this, questioning theories and frameworks have been developed in the realms of philosophy, psychology, and education. bloom’s taxonomy and the socratic questioning method are two such established frameworks, providing a structured approach to crafting questions (anderson, krathwohl, and bloom 2001; elder and paul 2002). these frameworks target different levels of cognitive complexity, ranging from basic knowledge recall to higher-order analysis, synthesis, and evaluation, and they can be utilized to design a general question matrix (knowledge compass 2023) to promote effective question design. information literacy instruction that touches gpt-based chatbots can give a general introduction of eqd, which can be enhanced by individual research consultation from subject librarians. chatbots can be used as a tool to help librarians and instructors foster critical thinking in the classroom and individual research consultations. in other words, with a general question matrix, the subject librarians can help patrons develop individualized question matrices for certain projects to guide them in developing critical thinking skills. combined with the domain knowledge and language awareness, the question matrix can be properly transformed into chatgpt prompts after careful trials according to the subjects and purpose. another reverse approach is to formulate questions through research articles that embody great critical thinking, turn these questions into prompts, and then use chatgpt as an assistant in conducting critical thinking training. classic articles in the domain can be utilized as “training cases” for critical thinking. https://doi.org/10.7191/jeslib.846 e846/13 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 chatgpt may be a useful tool in alleviating the tension that has been perceived between subject matter instruction and critical thinking. administrators in addition to these dyadic interactions with students and instructors, librarians will be critical stakeholders in the development of broader chatbot use guidance for the university community. this will involve increasing interactions with selected university administrators in establishing policies on chatbot use. these stakeholders include not only higher-level library administrators, but also other university administrators charged with implementing teaching, learning, and assessment frameworks that cross university units. we have conducted some preliminary reviews of university policies regarding chatbot use, which appear to be very diverse. while further study is needed, our reviews suggest differing tensions between resisting or highly circumscribing chatbot use versus embracing its potential in learning. another key issue is the extent to which administrative leaders attempt to develop centralized university policy in this area, as opposed to decentralizing it for unit decision-making or leaving important determinations to instructors or other personnel. reviewing such policies more systematically is an interesting area for further research, and university librarians will be well-suited to contribute to related dialogues needed to create best practices. they similarly will assume major responsibilities for developing broader scale training sessions and advisories for diverse university stakeholders, including administrators. this is likely to be a daunting challenge for librarians already needing to keep pace with new information provision technologies in the rapidly changing digital space, but it likewise presents a vital opportunity for librarians to elevate their stature and impact in rcs and in related university information strategy development. lessons learned and future work we began this project with limited knowledge about chatbot use in education and research, and as such we will reiterate only a few of the many lessons learned. first, our case simulation demonstrated that gptbased products differ substantially in their capabilities. further research that differentiates these product capabilities in more depth is important, as findings in this respect will be useful not only to librarians but more widely to consumers. a related concern is that we found the one paid chatgpt-based product we reviewed generally provided more sophisticated responses, and it would not be surprising if additional paid products of this nature will provide higher level capabilities than free ones. this raises questions concerning whether chatbots such as these will further extend the digital divide and other digital inequities. second, we were impressed by the power of these applications to support research writing and research product development, but also concerned with major limitations in what was produced. working to clarify both the strengths and limits of these products, and developing related strategies for their effective and https://doi.org/10.7191/jeslib.846 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 e846/14 ethical use, requires additional attention. research related to how best to engage in fact-checking appears especially important with respect to the limitations we observed. third, the manner in which questions are framed, especially with regard to their specificity and related followups/dialogues, is very important to minimizing drifting and improving the overall quality of responses. examining strategies for training users on how best to interact with chatbots in seeking information consequently merits attention, and librarians can play key roles in this area. improving questioning strategy and integrating ethical considerations in questioning frameworks may be one promising direction for improving ai information literacy. librarians can work with instructors and students to develop prompts that represent translations from questioning frameworks with built-in critical thinking and ethical considerations. fourth, as described in the previous section, librarians have many important roles to play in contributing to responsible ai practice. those who develop a clear understanding of university ai related ethics policies and build rich expertise and experience in using chatbots can employ them selectively as a tool in conducting research consultations. for example, referring to the acrl framework “knowledge practice” set (association of college & research libraries 2016, 7), librarians can use chatbots to help patrons in developing and clarifying research questions, finding theories/frameworks, harvesting search terms, translating search strings between databases, advising on individualized prompting for their projects, and providing simple guidelines on ethical use of output based on the university policy and course policy. in the “disposition” set, librarians can contribute to ethical chatbot use by helping patrons to understand their relationship with chatbots, giving tips to students on how to raise awareness of their thinking and emotions, and to guide them towards a growth mindset during chatbot interactions. finally, the acrl framework applies not only to individuals, but also to institutions and learners. when formulating policies, universities should be mindful of potential contradictions and ensure that the measures taken are balanced and do not have unintended consequences that restrict learning. we are especially concerned that the very real need to assure ethical use of chatbots in learning does not inhibit institutions from simultaneously embracing and exploring their vast potential. documentation ethics related documentation we consulted the following policies and codes of ethics. • university of illinois academic integrity policies: • academic integrity and procedure • students’ quick reference guide to academic integrity • instructors’ quick reference guide to academic integrity https://doi.org/10.7191/jeslib.846 https://studentcode.illinois.edu/article1/part4/1-401 https://provost.illinois.edu/policies/policies/academic-integrity/students-quick-reference-guide-to-academic-integrity https://provost.illinois.edu/policies/policies/academic-integrity/instructors-quick-reference-guide-to-academic-integrity e846/15 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 • the american library association (ala) code of ethics • national association of social workers (nasw) code of ethics • council on social work education (cswe) 2022 educational policy and accreditation standards (epas) • international federation of library associations and institutions (ifla): • committee on freedom of access to information and freedom of expression (faife): ifla statement on libraries and artificial intelligence • ifla code of ethics for librarians and other information workers • association of college & research libraries (acrl): • framework for information literacy for higher education (2016) • companion document to the acrl framework for information literacy for higher education: social work (2020) acknowledgements the research case study was developed as part of an imls-funded responsible ai project, through grant number lg-252307-ols-22. competing interests the authors declare that they have no competing interests. references american library association. 2008. “code of ethics of the american library association.” http://www.ala.org/advocacy/proethics/codeofethics/codeethics. anderson, lorin w., david r. krathwohl, and benjamin s. bloom. 2001. a taxonomy for learning, teaching, and assessing: a revision of bloom’s taxonomy of educational objectives. new york: longman. association of college & research libraries (acrl). 2011. “acrl guidelines for instruction programs in academic libraries.” http://www.ala.org/acrl/standards/guidelinesinstruction. ————. 2016. “framework for information literacy for higher education.” https://www.ala.org/acrl/standards/ilframework. ————. 2021. “social work, companion document to the acrl framework for information literacy for higher education.” https://acrl.libguides.com/ld.php?content_id=62704385. https://doi.org/10.7191/jeslib.846 https://www.ala.org/tools/ethics https://www.socialworkers.org/about/ethics/code-of-ethics https://www.cswe.org/accreditation/policies-process/2022epas https://www.cswe.org/accreditation/policies-process/2022epas https://repository.ifla.org/handle/123456789/1646 https://repository.ifla.org/handle/123456789/1646 https://www.ifla.org/g/faife/professional-codes-of-ethics-for-librarians https://www.ala.org/acrl/standards/ilframework https://acrl.libguides.com/sw/about https://acrl.libguides.com/sw/about https://www.lib.montana.edu/responsible-ai/ https://www.imls.gov/grants/awarded/lg-252307-ols-22 http://www.ala.org/advocacy/proethics/codeofethics/codeethics http://www.ala.org/acrl/standards/guidelinesinstruction https://www.ala.org/acrl/standards/ilframework https://acrl.libguides.com/ld.php?content_id=62704385 journal of escience librarianship 13 (1): e846 | https://doi.org/10.7191/jeslib.846 e846/16 association of college & research libraries (acrl), educational and behavioral sciences section [ebss], social work committee. 2020. “companion document to the acrl framework for information literacy for higher education: social work.” accessed march 5, 2023. https://acrl.libguides.com/sw/about. council on social work education. 2015. “2015 educational policy and accreditation standards [epas].” https://www.cswe.org/getattachment/accreditation/accreditation-process/2015-epas/2015epas_ web_final.pdf. ————. 2022a. “2022 educational policy and accreditation standards [epas].” https://www.cswe.org/accreditation/policies-process/2022epas. ————. 2022b. “2022 educational policies and accreditation standards [epas]: frequently asked questions (version 9.2.2022).” https://www.cswe.org/getmedia/67a67f0b-839e-420d-8cead919f9e6ca3a/2022-epas-faqs.pdf. elder, linda and richard paul. 2002. the miniature guide to the art of asking essential questions. dillon beach ca: the foundation for critical thinking. international federation of library associations and institutions. 2012. “ifla code of ethics for librarians and other information workers.” ifla publications aug-2012. https://repository.ifla.org/handle/123456789/1850. knowledge compass. n.d. accessed march 7, 2023. https://www.knowledgecompass.org. national association of social workers. 2023. “highlighted revisions to the code of ethics.” https:// www.socialworkers.org/about/ethics/code-of-ethics/highlighted-revisions-to-the code-of-ethics. sharegpt dataset. 2023. accessed november 14, 2023. https://huggingface.co/datasets/ryokoai/sharegpt52k. singer, jonathan b., melanie sage, stephanie cosner berzin, and claudia j. coulton. 2022. “harness technology for social good.” in grand challenges for social work and society, 2nd ed, edited by richard p. barth, jill t. messing, trina r. shanks, and james h. williams, 230-256. new york: oxford university press. https://doi.org/10.7191/jeslib.846 https://acrl.libguides.com/sw/about https://www.cswe.org/getattachment/accreditation/accreditation-process/2015-epas/2015epas_web_final.pdf https://www.cswe.org/getattachment/accreditation/accreditation-process/2015-epas/2015epas_web_final.pdf https://www.cswe.org/accreditation/policies-process/2022epas https://www.cswe.org/getmedia/67a67f0b-839e-420d-8cea-d919f9e6ca3a/2022-epas-faqs.pdf https://www.cswe.org/getmedia/67a67f0b-839e-420d-8cea-d919f9e6ca3a/2022-epas-faqs.pdf https://repository.ifla.org/handle/123456789/1850 https://www.knowledgecompass.org https://www.socialworkers.org/about/ethics/code-of-ethics/highlighted-revisions-to-the code-of-ethics https://www.socialworkers.org/about/ethics/code-of-ethics/highlighted-revisions-to-the code-of-ethics https://huggingface.co/datasets/ryokoai/sharegpt52k