LETTER TO THE EDITORS INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2023 https://doi.org/10.5860/ital.v42i4.16995 About this Section Letters to the Editor reflect the opinions of their authors and are not necessarily those of the ITAL Editorial Board or ALA’s Core Division. Each letter’s copyright is held by its authors and is published under a Creative Commons CC-BY-NC-4.0 license. Dear Editorial Board, I read Richard Brzustowicz’s recent article in Information Technology and Libraries, “From ChatGPT to CatGPT,” and while it excites me to see the conversation about AI and cataloging emerging in the scholarly discourse, I have unfortunately found a number of errors in this article’s methodology which I believe to be incapable of leading in good faith to the conclusions offered. I have included below some supporting evidence, though my comments are not to be taken as a total analysis of this article. In the interest of brevity, my comments will focus primarily on the methodology and conclusions sections of the article, in particular the first comparison of a generated record and one found in WorldCat. My feedback is structured in the form of responses to specific quotations from the article that present issues in the domain of cataloging, the evaluation of metadata quality, and the citation of inputs and outputs in interactions with generative AI. “When asked about its training data, ChatGPT replied: …” (2) What exactly was asked of ChatGPT? Without documenting the input, the output lacks sufficient context to draw firm conclusions about its validity. Inputs are a critical component of scholarship on AI, as the object of study consists of the input and the output together. For additional context, I have included a brief annotated bibliography of peer reviewed scholarship that cites interactions with ChatGPT. Choudhary, Om Prakash, Jyoti Saini, and Amit Challana. “ChatGPT for Veterinary Anatomy Education: An Overview of the Prospects and Drawbacks.” International Journal of Morphology 41, no. 4 (August 2023): 1198–1202, https://doi.org/10.4067/s0717- 95022023000401198. See Table 1 (1200). Inputs and outputs are arranged in a table to aid comprehension. Gross, Nicole. “What ChatGPT Tells Us about Gender: A Cautionary Tale about Performativity and Gender Biases in AI.” Social Sciences 12, no. 435 (August 2023). https://doi.org/10.3390/socsci12080435. See Data Availability statement (12), “The illustrative examples (responses) used in this paper have been generated by ChatGPT in response to questions posed by the author (prompts) (https://chat.openai.com/). The prompts can be found in the reference list and full responses can be made available on request.” Suppadungsuk, Supawadee, Charat Thongprayoon, Pajaree Krisanapan, Supawit Tangpanithandee, Oscar Garcia Valencia, Jing Miao, Poemlarp Mekraksakit, Kianoush Kashani, and Wisit Cheungpasitporn. 2023. “Examining the Validity of ChatGPT in Identifying Relevant Nephrology Literature: Findings and Implications.” Journal of Clinical Medicine 12, no. 5550 (2023). https://doi.org/10.3390/jcm12175550. https://doi.org/10.4067/s0717-95022023000401198 https://doi.org/10.4067/s0717-95022023000401198 https://doi.org/10.3390/socsci12080435 https://doi.org/10.3390/jcm12175550 INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2023 LETTER TO THE EDITORS 2 FLOYD See section 2.1 (3), “The search prompts provided to ChatGPT requested references in the Vancouver style, a commonly used citation style in academic writing, along with their corresponding links. We generated the prompt “Please provide the references in Vancouver style and their links in recent literature on... name of the topic” to ChatGPT.” “I asked ChatGPT to generate a MARC record for the 1996 edition of Anne Rice’s Interview with the Vampire using RDA (ChatGPT, personal communication, February 23, 2023).” (2) The article again does not document the actual input given to ChatGPT, therefore the amount of priming ChatGPT may have received, or perhaps any ambiguity in the prompt that could explain the composition of the generated record is unknown. For example, consider these three 1996 publications of Anne Rice’s Interview with the Vampire that would be recorded on separate bibliographic records. In 1996, Knopf published the anniversary edition of the original 1976 Ballantine edition; Warner books, the UK publisher and distributor, reissued its 1976 edition; and Boekerij, in Amsterdam, published a translation into Dutch with a parallel English title, making it a third 1996 release of Anne Rice’s Interview with the Vampire. Not all are true editions in the bibliographical sense, but they all were produced in 1996, they all bear the title Interview with the Vampire, and they all meet the description in the above paraphrase of the prompt given to ChatGPT. The prompt also specifies that the record should use RDA, however, neither the generated record nor the record offered for comparison actually applies that standard, which goes unacknowledged in the article. If they did, the records would be coded as RDA compliant using an 040 field with the element “$e rda.” That is not to say that the records are totally non-compliant with RDA though, as there are many similarities between RDA and its predecessor ruleset, the Anglo American Cataloging Rules, 2nd Edition (AACR2). There are also specifically RDA compliant fields in both records, such as the 336, 337, and 338 fields. However, because the record found in WorldCat is not coded RDA, the most likely source of these fields is an automated process from OCLC, as evidenced by the record’s edit history. Modifications to records are marked by institution codes added to the 040 field, and the WorldCat record selected shows 5 from OCLC: “$d OCLCO $d OCLCF $d OCLCQ $d OCLCO $d OCLCA” (9). “I compared it to a record in OCLC’s WorldCat,” (2) How was this record selected? In WorldCat, as accessed through OCLC Connexion, there are 36 bibliographic records that match the keywords Anne, Rice, 1996 and the title words interview, vampire. Nineteen of these records are cataloged in English. Only one is coded RDA (OCLC record #1300814022), and that record was not the one record selected for comparison. Interestingly, it has some errors in its application of RDA, such as abbreviating “title page” to “Tp.” which is incorrect, and the lack of a relationship indicator for the author (which should be used whenever appropriate, and the “author” relationship here is unambiguous). The record the author selected is OCLC #1052676753. It is a good record, without errors, but it does not meet the author’s stated criteria for ChatGPT to be compared with, invalidating it as an example. This is also true of the record in Table 6. “The results of this test indicate that ChatGPT can produce an accurate and effective record for Interview with the Vampire.” (2) This is incorrect, given the evidence presented. In just this first record, I found the following errors and inconsistencies. In the 100 field, the record uses neither the authorized access point for the INFORMATION TECHNOLOGY AND LIBRARIES DECEMBER 2023 LETTER TO THE EDITORS 3 FLOYD author, nor the relationship designator “author” in $e. The authorized access point is “Rice, Anne, 1941-2021.” The record also has no 250 field (edition). This 1996 copy is a special edition of the book, whether the Knopf or the Ballantine one, and an edition note is required. The use of a 260 field is incorrect, because only the 264 is able to clearly disambiguate publication information from manufacture, distribution, production, and copyright, as required by RDA. In the 300 field, “pages” is abbreviated to “p.,” which, while valid under AACR2, ceased with RDA. Lastly, and this may just be the formatting of the table, the 650 field’s indicator 0 is in the wrong position. These errors range from relatively minor, to the kind of mistake that would fail a validation check in OCLC Connexion. Further, none of these listed fall into the area of cataloger’s judgment. “As ChatGPT follows established cataloging rules, records created by the model are less likely to contain errors or inconsistencies;” (5) The errors present in ChatGPT’s generated records directly contradict this claim. “One concern is the potential for copyright infringement, as ChatGPT’s detailed descriptions of original works may be too like the originals, leading to legal issues for those who use the generated content without proper attribution or permission. This concern is particularly heightened for copyrighted works like books or music, where even small portions of the work can be protected.” (6) This claim is offered without any supporting citations. The article is not engaging with the discourse about fair use and the copyright status of text used to facilitate search and retrieval, or with specific rulings, such as those in Authors Guild, Inc. v. Google, Inc. and Authors Guild, Inc. v. HathiTrust. “The study demonstrates that ChatGPT has the potential to significantly streamline the cataloging process in libraries by generating accurate and consistent records.” (6) I have serious doubts about this conclusion, as can be seen by my above commentary. As a professional and a scholar in the area of cataloging, I find this to be both misleading and under- informative for readers who may someday be tasked with making hard decisions about the value of machine generated metadata vs. the labor of catalogers and metadata specialists. As a result, I would suggest that you seriously consider retraction or significant revisions. David Floyd Chief Cataloging Librarian Binghamton University dfloyd@binghamton.edu mailto:dfloyd@binghamton.edu