letter from the editor (june 2022) letter from the editors kenneth j. varnum and marisha c. kelly information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.15225 editorial board update i would like to open with a message of gratitude to the editorial board members who have helped shape the direction and focus of the journal over the past four years. steve bowers, kevin ford, cinthya ippoliti, ida joiner, michael sauers, and laurie willis have been fantastic colleagues, providing sage advice and thoughtful opinions through their tenures. together, they have reviewed dozens of articles for the journal but, more importantly, have helped shape the policies and directions we hope to take. together, we thought through and instituted our name change policy, a policy for revision of published articles, and ongoing efforts to identify sources of bias in editorial and reviewing practice. this work lays the foundation for future improvements. even as we say farewell to these editorial board members, it is my pleasure to welcome these individuals to the editorial board on july 1: ashlea green, mary a. guillory, dana haugh, shanna hollich, and cynthia schwarz. they were selected from an impressive pool of applicants. we are grateful for all who applied. we welcome submissions related to the intersection of cultural memory institutions (libraries, archives, and museums) and technology. our call for submissions outlines the topics and process for submitting an article for review. if you have questions or wish to bounce ideas off the editor and assistant editor, please contact either of us at the email addresses below. this issue’s contents the june “public libraries leading the way” column is contributed by julie lane at the county of prince edward public library and archives. lane describes how the covid-19 pandemic not only led to immediate changes to serve a geographically distributed community, but also increased the library’s horizons in terms of advocating for and promoting equitable access to learning materials . our peer-reviewed content this month showcases topics including collection analysis, userlearner profiles, topic modeling, copyright bots, intangible cultural heritage, contactless services, and explainable artificial intelligence. 1. rarely analyzed: the relationship between digital and physical rare books collections / allison mccormack and rachel wittmann 2. ontology for the user-learner profile personalizes the search analysis of online learning resources: the case of thematic digital universities / marilou kordahi 3. applying topic modeling for automated creation of descriptive metadata for digital collections / monika glowacka-musial 4. classical musicians v. copyright bots: how libraries can aid in the fight / adam eric berkowitz 5. research on knowledge organization of intangible cultural heritage based on metadata / qing fan, guoxin tan, chuanming sun, and panfeng chen 6. contactless services: a survey of the practices of large public libraries in china / yajun guo, zinan yang, yiming yuan, huifang ma, and yan quan liu 7. explainable artificial intelligence (xai): adoption and advocacy / michael ridley kenneth j. varnum, editor marisha c. kelly, assistant editor varnum@umich.edu marisha.librarian@gmail.com https://ejournals.bc.edu/index.php/ital/name-change-policy https://ejournals.bc.edu/index.php/ital/name-change-policy https://ejournals.bc.edu/index.php/ital/call-for-submissions https://ejournals.bc.edu/index.php/ital/article/view/13415 https://ejournals.bc.edu/index.php/ital/article/view/13601 https://ejournals.bc.edu/index.php/ital/article/view/13601 https://ejournals.bc.edu/index.php/ital/article/view/13799 https://ejournals.bc.edu/index.php/ital/article/view/13799 https://ejournals.bc.edu/index.php/ital/article/view/14027 https://ejournals.bc.edu/index.php/ital/article/view/14093 https://ejournals.bc.edu/index.php/ital/article/view/14141 https://ejournals.bc.edu/index.php/ital/article/view/14683 mailto:varnum@umich.edu mailto:marisha.librarian@gmail.com editorial board update this issue’s contents gathering strength to combat access inequality: how a small rural public library supported virtual access for public school students, staff, and their families public libraries leading the way gathering strength to combat access inequality how a small rural public library supported virtual access for public school students, staff, and their families julie lane information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.15161 julie lane (jlane@peclibrary.org) is technology resource centre coordinator and educational resource consultant, county of prince edward public library and archives. © 2022. prince edward county (pec) is located east of toronto and covers approximately 1,050 square kilometers. pec is a part of the hastings prince edward district school board (hpedsb) and have a total of 6 public schools, one catholic school, and one private school. the other county serviced by our school board is hastings county. the county of prince edward public library (cpepl) system of 6 branches services just under 25,000 residents and countless seasonal visitors during the tourism season. our public school board services approximately 15,000 students across 7,220 square kilometers and 39 in-person schools and a k-10 virtual school across the two counties. starting off a technology column with a bunch of statistics is not exactly how i figured i would write this. however, context is key when discussing equity and access; and in this piece, i intend to highlight how both of those are made significantly easier to achieve for community stakeholders, with the presence of technology and education. when the stay-at-home orders were announced in march 2020 due to the covid-19 pandemic, we knew that we would not be able to hold our scheduled and planned public library programs. we turned to live streaming story times, maker programs, and author visits, all using what equipment we had on hand—tablets, laptops, and the internet. once it became clear that students in the public schools would not return to in-person learning within any short amount of time, all school boards in ontario ensured that enough chromebooks were purchased so that every student had their own dedicated device, with the assumption that providing a device meant all students could participate in remote learning. teachers rushed to transition their teaching plans to an online format; school administrators scrambled to schedule safe device pick-ups for students; and parents were not only juggling professional responsibilities and parenthood, but now teaching and tech support. although school boards provided tools to meet the “classroom” requirements, they could not ensure that every single student had access to a high-speed internet connection, nor could they offer school library access remotely. this is where the cpepl was able to offer support. the global shut down had a significant impact on the relationship that the cpepl had with the schools in our county. a large focus of mine was to rebuild those working relationships to support students, staff, and families, and ultimately demonstrate in actionable ways how the local pu blic library system was there for them. one immediate way i thought we could demonstrate support was through lending our wi-fi hotspots. hotspot lending programs through public libraries have gained popularity over the last few years. although our program had been in place for nearly 5 years, i am always surprised at the number of people that do not realize it is an available resource. with that in mind, i persistently reached out to the school administrators in our area and set up meetings to discuss how our borrow the internet program could benefit those working remotely without reliable internet. wait lists for our 9 available hotspot devices drastically increased, but mailto:jlane@peclibrary.org information technology and libraries june 2022 gathering strength to combat access inequality | lane 2 our patron community was incredibly supportive of our students and would frequently request that their loan, which is at maximum 7 days in length, be passed to a student. though connecting families with internet hotspots was helpful for the required online learning, we could not fill the gap completely. if we had an unlimited communications budget, the situation would have been easily remedied, but, as we all know in the library world, budgets can be very tight. this fact pushed us to find creative ways to bring as many resources as possible to the students, staff, and families in our community. to broaden the reach to individual schools (and staying persistent with that outreach), i focused on not only ensuring that school communities knew what physical resources the library had, but also what electronic resources were available. these conversations and emails with school administrators led me to get in contact with the curriculum coordinator at the board office. this connection was a complete game changer. instead of us, as a public entity outside of the school community, contacting individual schools and trying to build relationships with teachers, librarians, and administrators, we had the person who oversaw all of the school librarians, library technicians, and curriculum development for the k-8 grades on our side. the coordinator was on board to help us make the desired connections with the schools in a number of ways. she put us in contact with the curriculum coordinator for the secondary grades (9-12) and our program and service list was sent from the board office to every teacher, principal, school librarian, and library technician in prince edward county. we were then able to set up a meeting with the coordinator of assistive technologies for the board, which set us on a track to completely revamp how we marketed and allocated our resources to schools. it became clear in our first conversation that we needed to get students connected with their public libraries as quickly and efficiently as possible. with students split between in-person learning, virtual learning, or a combination of the two, with still minimal to no access to school library borrowing, the online resources of the public library system seemed like the perfect solution. not only would connecting students, staff, and their families with their local public library be a way to get everyone reading, but we were fulfilling the opportunity to ensure that everyone had genuine and equitable access. what the school board had observed was that the required shift to remote learning made the inequality of literature access glaringly obvious. students who relied on their school library for reading were not getting that opportunity and students who had individual education plans were jumping through hoops to get digital copies of material. so though everyone had a school supplied chromebook, not everyone had the same access. this is where public library subscriptions to hoopla and libby came to the rescue for providing current and popular literature in a variety of electronic formats for students to immediately access for both course reading and leisure enjoyment. connecting with like-minded, growthand education-oriented people is incredibly empowering. the curriculum coordinators at the board office were so enthusiastic about connecting students, staff, and families in our school board with their public library that it made the next parts of the process not only successful, but fun as well! the curriculum coordinators and i created a presentation that we brought first to school administrators in prince edward county. having public library advocacy come from the school board was incredibly influential and a big step toward issuing library cards to students. once we had buy-in from the school administrators, we circulated registration forms for families to fill out and get everyone in their household public library access. we found that the easiest way to do this information technology and libraries june 2022 gathering strength to combat access inequality | lane 3 was using google forms. it was simple for parents to fill out and easy for library staff to glean the required information for card registration. since the library was also working with the virtual school, we needed to be able to issue library cards even if some students were not in our catchment area. it was common for virtual classes to consist of students from the smallest village in pec and all the way up to the northern most part of hastings county, a full 3 hours’ drive away. cpepl was able to accommodate this need. pec is a tourist destination and frequently issues cards for visitors staying in the area for an extended period of time under the rule of if you “wo rk, live, or play” in pec, you are eligible for a public library card. once library cards were set up or renewed for all families who requested them through the google form, i got to work teaching students and staff how to access library resources. after communicating with the curriculum staff and public school administrators, it was decided that creating an information presentation on getting started with hoopla was the best course of action. hoopla is an incredibly intuitive application in regards to the format possibilities (ebooks and audiobooks) as well as adjustable features within each format. the available settings and adjustment options make the reading experience comfortable and accessible as possible for users. also, since there is no wait time to borrow materials, this allowed entire classes learning remotely to all check out the same title and read together. the material presented to students was easy to understand and interactive. the session provided ample time for students to follow along and test each feature in the hoopla app with their own individual book selections. the best part? this presentation was just the starting point. while we were only able to schedule and virtually deliver this presentation at two in-person schools, the other five schools in pec and a number of primary classes in the virtual school still participated in the google form for library card registration. teachers started asking what else the public library had to offer to enhance the curriculum delivery with additional resources. many community teachers were reminded of the public library’s services and resources (beyond just hoopla) and reached out for class visits or access to materials. other schools outside of our prince edward county catchment reached out and connected with their local public libraries, or vice versa. we are still working to develop ways to meet the needs of students, staff, and their families through the public library. some schools in the northern area of the region have students coming from multiple, different public library catchment areas, and most of these libraries do not have the same resources as others, especially in the case of smaller systems. this posed an issu e of equitable access for students: why should some students in the class have access to library online resources, and some not because they come from different/smaller communities? we were able to mitigate this issue with the virtual school, but for students attending in-person learning, we could not give library cards to every student in the school board. thankfully, another public library system in our area stepped up their access to offer virtual library access to any student or teacher in hastings county (so everywhere except prince edward county). this recognition of the importance of equitable access enabled students to not only regain access to a public library system, but it also ensured that all students could access books in the way that best suited them. when i ask a class if listening to an audiobook counts as reading, it amazes me that the majority of the class say “no.” or if i ask students if they had ever read an ebook, some would say it was not a “real” book. these comments and notions are not only untrue, but they are information technology and libraries june 2022 gathering strength to combat access inequality | lane 4 also exclusionary. countless students need other formats than just printed materials. how many would benefit from listening to an audiobook along with reading a printed version? how many students dislike reading because it is just hard to see the words, but if the text was more spaced out, or a different font, it would make all the difference? how many times is a student not able to access a book they want because all available copies are already checked out at their school library? these are issues students in the classes i work with face. having a public library card can significantly ease these barriers to access. all in all, we processed hundreds of card requests and renewals and were able to powerfully illustrate to teachers how they could meaningfully integrate public library resources into their classrooms, either virtually or physically. our requests for library visits came back up to prepandemic levels, but we were working with more schools than we had previously. teachers were, and still are, reaching out and asking if we can get extra copies of books, or if we can lead virtual novel studies. one of our more popular pieces of progress is the integration of our coding programs with other subjects. currently, i am running a ukulele program where students are writing group arrangements using binary code as the basis for composition. we have classes doing art projects with robotics and integrating math learning objectives. we have done virtual story time and connected the story to creating scratch programs. the possibilities are endless , and now that we once again have the interest from teachers, we are working with them to support their students and all the learning that comes with incorporating technology and maker-thinking into a classroom environment. the momentum has not let up, and we are beyond thrilled. our communities and local school board have embraced the reality that public libraries are more than just books. public libraries are a critical part of any community and have the power to be a meaningful component to education at all levels. having schools and all educational stakeholders using public library services not only broadens the reach of a public library, but also broadens our advocacy potential. we know there is still a long way to go in terms of genuine equitable access, especially when it comes to technology. internet connectivity and technology literacy are just the tip of the iceberg, but when organizations support each other to truly serve their community, collectively, that is how you make change. classical musicians v. copyright bots: how libraries can aid in the fight article classical musicians v. copyright bots how libraries can aid in the fight adam eric berkowitz information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14027 adam eric berkowitz (berkowitza@hcflgov.net) is supervisory librarian, tampahillsborough county public library. © 2022. abstract the covid-19 pandemic forced classical musicians to cancel in-person recitals and concerts and led to the exploration of virtual alternatives for engaging audiences. the apparent solution was to livestream and upload performances to social media websites for audiences to view, leading to income and a sustained social media presence; however, automated copyright enforcement systems add new layers of complexity because of an inability to differentiate between copyrighted content and original renditions of works from the public domain. this article summarizes the conflict automated copyright enforcement systems pose to classical musicians and suggests how libraries may employ mitigation tactics to reduce the negative impacts when uploaders are accused of copyright infringement. introduction the covid-19 pandemic, unlike anything the country has seen in a century, forced industries to reevaluate the manner in which they provide services to the public. businesses and citizens everywhere made hairpin turns as they quickly searched for virtual alternatives to everyday inperson activities. with many remaining home for extended periods of time, demand for digital content and entertainment skyrocketed. in may 2020, comcast reported a 40% increase in online video streaming since march 1, just weeks before governments instated stay-at-home mandates.1 throughout the year, subscription-based streaming services saw enormous surges in customer usage and, likewise, social media platforms saw a significant spike in content production and consumption.2 daily blogging on facebook replaced in-person interactions, and youtubers generated higher volumes of videos to meet viewer demand. classical musicians were also heavily reliant on social media platforms in order to showcase performances as pointed out in the washington post article “copyright bots and classical musicians are fighting online. the bots are winning.” highlighted by american library association’s american libraries, the article illustrated the toll social media content moderation algorithms took on classical musicians sharing their performances online.3 this article became the starting point for the 2021 study “are youtube and facebook canceling classical musicians?,” which investigated the relationship between classical musicians and automated copyright enforcement systems.4 the following is a summary of this study’s findings and brings attention to the role libraries can play in aiding classical musicians facing copyright infringement claims. automated copyright enforcement evidence shows that automated copyright enforcement systems wrongfully remove useruploaded materials in the name of copyright protections on a regular basis.5 in fact, it happens so often that the australian broadcasting corporation began wittingly dubbing such instances “copywrongs.”6 these algorithms are not designed to distinguish between recordings of music mailto:berkowitza@hcflgov.net information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 2 owned by record labels and those shared online by freelance musicians. they are instructed to recognize copyrighted recordings and content resembling those recordings as identical matches, ensuring the protection of intellectual property from unauthorized reproduction. as such, automated content moderation systems are incapable of making allowances for the performance of works from the public domain. such performances comprise nearly all of a classical musician’s repertoire. automated copyright enforcement systems are typically based on a combination of matching and classification methods. the most effective matching technique for content moderation is perceptual hashing, which isolates unique strings of data (hashes) taken from an uploaded file and compares distinguishing markers and patterns to a database of samples provided by copyright owners.7 this technique allows systems to detect exact matches and iterations of the original work, such as live recordings and remixes.8 among classification methods, artificial neural networks with deep learning are best suited to the task of algorithmic moderation. consisting of a network of nodes, they are meant to simulate the structure and function of neural networks in animals and humans.9 this enables them to solve multifaceted, dynamic problems, which makes them ideal for instantaneous content moderation, allowing them to identify musical similarities in real time.10 both youtube and facebook enable users to upload recordings and broadcast live feeds to their websites. matching techniques are used to review prerecorded content since the upload process allows for automated systems to sample the material for comparison to the companies’ hash databases before allowing the recording to be posted.11 in contrast, live broadcasts are transmitted instantaneously and allow for no time to review the footage before it is visible online. therefore, hashes cannot be sampled from streaming content, requiring that classification methods using training data identify infringing material on the fly.12 while these algorithms make content moderation easier, they are limited in their capacity. one study showed that youtube is surprisingly inaccurate in its attempts to recognize infringing material in live broadcasts, failing to identify 26% of copyrighted footage within the first thirty minutes of streaming and blocking 22% of non-infringing livestreams.13 research strongly suggests that the only factors considered by music copyright enforcement systems are pitch, volume, and melodic and harmonic contour.14 those values alone cannot be used to distinguish copyrighted works from the public domain. as such, these systems are not yet advanced enough to account for the total complexity of human creativity, and human intervention is required before these programs systematically accuse uploaders of copyright infringement.15 compositions in the public domain are not subject to copyright; however, recorded performances of compositions from the public domain can be copyrighted. individuals may upload or livestream their own performances of classical music without fear of infringing copyright but may not upload another musician’s copyrighted recordings of the same pieces. for example, no one owns the copyright to bach’s cello suites and, therefore, anyone can profit from performing these works. sony music, though, owns the copyright to yo-yo ma’s recordings of bach’s cello suites, and anyone uploading these specific recordings to social media would be infringing copyright and subject to the repercussions. unfortunately, automated copyright enforcement systems often misidentify an individual’s performances as copyrighted recordings. information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 3 the impact on classical musicians classical musicians are accustomed to having their content misidentified for infringing copyright, but with the pandemic forcing many more musicians to share performances regularly on social media, the problem has become ever more pervasive. adrian spence, the artistic director for chamber ensemble camerata pacifica, found himself appealing multiple copyright claims from both facebook and youtube. on occasion, he would dispute several claims issued by different copyright owners for the same recording. until these issues were resolved, facebook suspended camerata pacifica’s ability to livestream, and youtube displayed a notification on their channel informing viewers that their videos were likely to be removed due to anticipated copyright infringement.16 owen espinosa, a high school senior, was preparing for a piano recital, and during rehearsal, facebook ended his livestream over claims of copyright infringement. he was unable to successfully appeal the claim which meant that facebook would not host his performance. instead, he had to broadcast his recital on an acquaintance’s youtube channel.17 michael sheppard, a professional pianist, has had broadcasts interrupted and videos removed by facebook multiple times with notifications stating that music owned by naxos of america was detected in his performances.18 after facebook rejected his disputes, sheppard took to twitter, alerting naxos of his situation. his videos were eventually restored, but nothing could be done about his livestreams.19 the violinist.com broadcasts weekly, hour-long concerts featuring multiple guest musicians. during one of these performances, facebook muted child violinist yugo maeda due to a claim of copyright infringement. after appealing the notice, facebook unmuted maeda’s performance three days later.20 while covid-19 exacerbated the issue, classical musicians often had their performances interrupted or removed from social media. in 2019, conducting students at the university of british colombia had their facebook live feed interrupted over copyright infringement claims and, in 2018, facebook removed a recording of an in-home performance given by pianist james rhodes also stating that the music infringed copyright.21 also in 2018, the australia broadcasting corporation’s abc classic fm livestreamed a performance of beethoven’s symphony no. 9. the broadcast ended with facebook issuing a claim stating that the music in question was owned by two different copyright owners.22 in 2016, violinist claudia schaer disputed several of youtube’s copyright claims. she typically had success with these appeals, but one of her recordings received three claims from different copyright owners. she was able to refute two of them; however, the third remained, and she was warned that if she was unsuccessful in her second attempt at appealing the claim, her account would receive a copyright strike, deleting her video from the site permanently. she felt both intimidated and aggravated by the ordeal.23 the author of this article has also had to refute a copyright infringement claim on youtube. according to the notice, 51 seconds of the author’s approximately five-minute performance of beethoven’s “für elise” infringed copyright. as a result, the claimant authorized youtube to include ads in the video, allowing them to generate revenue. the dispute was upheld after the claimant’s 30-day window for a response expired. although the author does not rely on monetized videos and livestreams for income, it is unethical for another entity to profit from the work of an unaffiliated individual. information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 4 disputing a copyright claim while there is recourse for uploaders facing copyright claims from social media sites, the appeals process can be lengthy and overwhelming. it can take more than two months for youtube to render a verdict when a musician disputes a copyright notice. during this span of time, classical musicians depending on ad revenue cease to generate income as these funds are held by the company until a final decision is made, at which point all profits accumulated by the video are released to the appropriate party. if the claim is upheld, the recording may remain online with proceeds going to the supposed copyright owner.24 uploaders may attempt to refute the result, but a failed appeal leads to the video’s removal and a copyright strike levied against the uploader s preventing them from livestreaming and monetizing videos for three months. should this occur, a counter notification can be issued which insists that the content in question has been mischaracterized as infringing and requires that would-be copyright owners file a lawsuit to uphold the claim. after three strikes, accounts are permanently deleted along with all associated uploads.25 the time that elapses for a final verdict along with the suspension of uploading and livestreaming permissions due to a copyright strike amounts to more than five months without being able to sustain an income. when a single performance is charged with multiple claims from different entities, as in the aforementioned examples, the uploader must dispute each one individually. this makes it easy to accumulate copyright strikes, risking account termination. it would be reasonable to assume that many classical musicians who endure these circumstances avoid the dispute process for fear of youtube removing their recordings, enforcing limitations on their ability to broadcast and monetize videos, and even permanently deleting their accounts. meanwhile, mistakenly recognized copyright owners can leverage this by appropriating the earnings generated by the work of unaffiliated musicians. furthermore, should the matter be redirected to the courts, the uploader faces the burden of retaining legal counsel. youtube algorithms deal with approximately 98% of all copyright issues and, because youtube’s business model generates profits primarily via user-uploaded content, it has been found to show bias towards established copyright owners.26 copyright owners can set preferences for how they want the system to react to instances of copyright infringement, resulting in the automatic monetization of 95% of claims for the copyright owner. as a result, user uploads make up 50% of the revenue generated by youtube for the music industry.27 although google reported in 2018 that 60% of disputed claims were found in favor of accused uploaders, the system clearly benefits established copyright owners.28 all of the aforementioned musicians who were accused of copyright infringement had their livestreams interrupted, saw their videos removed, and witnessed companies profiting from their work performing music that has long since passed into the public domain. youtube’s video series copyright and content id on youtube attempts to educate users on how automated copyright enforcement and the dispute process work, and while fair use and copyright permissions are discussed, the public domain is never mentioned; although, youtube does offer a brief explanation of the public domain on its help site.29 according to the us copyright act, the duration of copyright extends to 70 years after the death of the known composer, and for uncredited compositions or those composed by a musician under a pseudonym, copyright is recognized for 95 years from the date the work was published or 120 years from when it was composed, depending on which information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 5 expires first.30 while record labels are fully within their right to protect the recordings they own, that should have no bearing on individual musicians performing pre-twentieth-century music. the majority of online music consumption occurs on social media sites with 47% of the market share going to youtube.31 reports from deezer showed a near 20% increase in users listening to classical music since the start of the pandemic.32 given that more users are gravitating towards listening to classical music, and that the most popular digital access point for music is youtube, classical musicians coping with pandemic-induced restrictions were presented with what should have proven to be a lucrative opportunity. adhering to social distancing requirements and stay-athome mandates meant musicians cancelled their performances, leading to an exploration of virtual alternatives such as uploading recordings and livestreaming. obstructing these activities interrupts their sole source of income. conclusion while researchers have suggested a handful of improvements for automated copyright enforcement systems, they have not addressed the role that libraries can play in assis ting classical musicians.33 the tampa-hillsborough county public library, prior to the spread of covid-19, maintained four branches outfitted with recording studios; today, that number has grown to five. prior to pandemic library closures, recording studios were reserved just over 800 times, amounting to about 1,600 hours of usage between january 1, 2019 and march 13, 2020. patrons using the recording studios produce music and videos with the intention of uploading them to social media. other libraries with recording studios likely see their patrons doing the same, but without knowledge of copyright. libraries have the means and the motive to assist classical musicians. libraries can hold classes covering the basics of copyright, fair use, and the public domain, or that expand upon how automated copyright enforcement systems work on social media. library staff, however, may feel overwhelmed by the numerous texts on these subjects and may not know where to begin. an excellent starting point is the frequently asked questions page on the us copyright office website. this webpage offers explanations for a broad array of copyright-related issues and questions.34 fair use allows for unauthorized borrowing from a creative work; however, navigating how fair use is determined is always challenging. steven m. davis’ “computerized takedowns: a balanced approach to protect fair uses and the rights of copyright owners” is a reliable point of reference for defining fair use, its application in copyright infringement cases, and ethical and legal implications regarding the limitations of algorithmic moderation systems.35 for a thorough look into the mechanics and applications of automated copyright enforcement, refer to the previously mentioned “are youtube and facebook cancelling classical musicians?” this article offers a synopsis on the shift from physical to digital media, descriptions of different algorithmic models developed specifically for copyright enforcement, and an account of how youtube’s and facebook’s copyright enforcement systems came to be.36 libraries can also offer help sessions that support patrons through the copyright claims dispute process. the youtube dispute interface is user friendly, and the instructions are comprehensible. throughout each step, explanations are offered to clarify what is being required of the user. for example, when asked for the reasoning behind the dispute, the user is offered four options: the disputed material is original content, the user has acquired permission to reproduce the co ntent, the content falls under fair use, or the content originates from the public domain. once selected, information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 6 additional explanations for each option are given in order to provide further clarification and context which allows the user to reconsider their choice and also helps the user better explain how their content falls under the selected category. finally, the user is asked to provide a narrative explaining how the content in question does not infringe copyright. facebook’s counternotification process is less generous, providing brief, ineffectual descriptions of copyright and a simple form requesting the user’s personal information and explanation for why the copyright infringement claim is unfounded. after library staff demonstrate the use of these interfaces, patrons can be guided to library resources to help them articulate and refine their arguments. for anything that cannot be found among the library’s collections, library staff may need to assist with internet searches, or patrons may request materials through interlibrary loan. additionally, patrons may still feel overwhelmed by the terminology being presented, which would further support the need for library programming that covers copyright-related topics. when considering the research involved to produce a convincing counterargument, information literacy and metaliteracy classes may be warranted. libraries can also encourage patrons to include descriptions in their uploads and livestreams with links to supporting evidence explaining that the featured music belongs to the public domain, and as the uploader, they own the rights to recordings and broadcasts of their own performances. the public domain description on youtube’s help page provides links to columbia university libraries’ copyright advisory service and cornell university’s copyright information center, and it suggests that these resources can lead to supporting evidence regarding works in the public domain.37 another excellent resource is the international music score library project’s petrucci music library. this database of almost 200,000 compositions belonging to the public domain features both sheet music and recordings of each of these works.38 users can also point to the public domain song anthology, a book comprising 348 popular songs from the public domain; the entire text can be downloaded from the publisher’s website.39 these resources and explanations can be included in disputes to support the reasoning for why a copyright claim is invalid. it should be noted that library employees are most often not lawyers, and as such, it is ill-advised to answer direct questions about the specific legality of the myriad of situations musicians face when disputing copyright claims. these matters require expert, specialist knowledge with which library staff are not equipped. the role of the library should only be to provide access to resources and inform the public on various issues regarding the use of information. as information specialists, librarians are in a unique position to educate patrons on information policy, and in this case, copyright. library systems with law libraries or with access to law collections and databases would be especially suited to teach patrons about copyright, guide them through the dispute process, and assist them with gathering resources to support their counterarguments. the tampahillsborough county public library and other systems like it that are outfitted with both music recording studios and a law library are encouraged to offer such services. hopefully, this overview of automated copyright enforcement, its impacts on classical musicians, and the suggestions to libraries offered here will promote further conversation that eventually leads to action and a possible solution. perhaps, as progress is made, automated copyright enforcement systems will grow more hospitable towards user-generated recordings and livestreams of classical music. after all, social media should be able to freely host the artistic talents of all musicians. information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 7 endnotes 1 “covid-19 network update,” comcast, may 20, 2020, https://corporate.comcast.com/covid19/network/may-20-2020. 2 julia alexander, “the entire world is streaming more than ever—and it’s straining the internet,” the verge, march 27, 2020, https://www.theverge.com/2020/3/27/21195358/streaming-netflix-disney-hbo-nowyoutube-twitch-amazon-prime-video-coronavirus-broadband-network; ella koeze and nathaniel popper, “the virus changed the way we internet,” the new york times, april 7, 2020, https://www.nytimes.com/interactive/2020/04/07/technology/ coronavirus-internet-use.html. 3 michael andor brodeur, “copyright bots and classical musicians are fighting online. the bots are winning,” the washington post, may 21, 2020, https://www.washingtonpost.com/entertainment/music/copyright-bots-and-classicalmusicians-are-fighting-online-the-bots-are-winning/2020/05/20/a11e349c-98ae-11ea-89fd28fb313d1886_story.html. 4 adam eric berkowitz, “are youtube and facebook cancelling classical musicians? the harmful effects of automated copyright enforcement on social media platforms,” notes 78, no. 2 (december 2021): 177–202. 5 rebecca tushnet, “all of this has happened before and all of this will happen again: innovation in copyright licensing,” berkeley technology law journal 29, no. 3 (december 2014): 1147–87. 6 matthew lorenzon, “why is facebook muting classical music videos?” abc classic fm, december 21, 2018, https://www.abc.net.au/classic/read-and-watch/music-reads/facebookcopyright/10633928. 7 xia-mu niu and yu-hua jiao, “an overview of perceptual hashing,” acta electronica sinica 36, no. 7 (2008): 1405–11. 8 robert gorwa, reuben binns, and christian katzenbach, “algorithmic content moderation: technical and political challenges in the automation of platform governance,” big data & society 7, no. 1 (january 2020): 7. 9 larry hardesty, “explained: neural networks,” mit news, april 14, 2017, https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414. 10 daniel graupe, principles of artificial neural networks, 3rd ed. (hackensack, nj: world scientific publishing company, 2013), 1–3. 11 gorwa, binns, and katzenbach, “algorithmic content moderation,” 7. 12 daniel (yue) zhang, jose badilla, herman tong, and dong wang, “an end-to-end scalable copyright detection system for online video sharing platforms,” in proceedings of the 2018 https://corporate.comcast.com/covid-19/network/may-20-2020 https://corporate.comcast.com/covid-19/network/may-20-2020 https://www.theverge.com/2020/3/27/21195358/streaming-netflix-disney-hbo-now-youtube-twitch-amazon-prime-video-coronavirus-broadband-network https://www.theverge.com/2020/3/27/21195358/streaming-netflix-disney-hbo-now-youtube-twitch-amazon-prime-video-coronavirus-broadband-network https://www.nytimes.com/interactive/2020/04/07/technology/coronavirus-internet-use.html https://www.nytimes.com/interactive/2020/04/07/technology/coronavirus-internet-use.html https://www.washingtonpost.com/entertainment/music/copyright-bots-and-classical-musicians-are-fighting-online-the-bots-are-winning/2020/05/20/a11e349c-98ae-11ea-89fd-28fb313d1886_story.html https://www.washingtonpost.com/entertainment/music/copyright-bots-and-classical-musicians-are-fighting-online-the-bots-are-winning/2020/05/20/a11e349c-98ae-11ea-89fd-28fb313d1886_story.html https://www.washingtonpost.com/entertainment/music/copyright-bots-and-classical-musicians-are-fighting-online-the-bots-are-winning/2020/05/20/a11e349c-98ae-11ea-89fd-28fb313d1886_story.html https://www.abc.net.au/classic/read-and-watch/music-reads/facebook-copyright/10633928 https://www.abc.net.au/classic/read-and-watch/music-reads/facebook-copyright/10633928 https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414 information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 8 ieee/acm international conference on advances in social networks analysis and mining (barcelona, spain: ieee press, 2018), 626–27. 13 daniel (yue) zhang et al., “crowdsourcing-based copyright infringement detection in live video streams,” in proceedings of the 2018 ieee/acm international conference on advances in social networks analysis and mining (barcelona, spain: ieee press, 2018), 367. 14 berkowitz, “are youtube and facebook cancelling classical musicians?,” 200. 15 diego cerna aragon, “behind the screen: content moderation in the shadows of social media,” critical studies in media communication 37, no. 5 (october 19, 2020): 512–14. 16 brodeur, “copyright bots and classical musicians are fighting online.” 17 amy williams, “camerata pacifica to stream high school graduate’s senior recital,” classical candor: classical music news and reviews (blog), june 6, 2020, https://classicalcandor.blogspot.com/2020/06/classical-music-news-of-week-june-62020.html. 18 baltimore school for the arts, “sometimes you have to fight!,” facebook, may 22, 2020, https://www.facebook.com/baltimoreschoolforthearts/posts/sometimes-you-have-to-fightour-michael-sheppard-was-recently-giving-a-facebook-/3146142648740808/. 19 michael sheppard (@pianistcomposer), “dear @naxosrecords please stop muting portions of works whose composers have been dead for hundreds of years.” twitter, may 9, 2020, https://twitter.com/pianistcomposer/status/1259118489622777856. 20 laurie niles, “facebook and naxos censor music student playing bach,” violinist.com (blog), july 13, 2020, https://www.violinist.com/blog/laurie/20207/28375/. 21 brodeur, “copyright bots and classical musicians are fighting online”; ian morris, “facebook blocks musician from uploading his own performance—but did he break copyright?” daily mirror, september 7, 2018, https://www.mirror.co.uk/tech/facebook-blocks-musicianuploading-performance-13208194. 22 matthew lorenzon, “why is facebook muting classical music videos?” abc classic fm, december 21, 2018, https://www.abc.net.au/classic/read-and-watch/music-reads/facebookcopyright/10633928. 23 claudia schaer, “youtube copyright issues,” violinist.com (blog), february 15, 2016, https://www.violinist.com/discussion/archive/27589/. 24 “monetization during content id disputes,” youtube help, accessed october 24, 2019, https://support.google.com/youtube/answer/7000961?hl=en&ref_topic=9282678#zippy=,fili ng-a-content-id-dispute,more-info-about-the-content-id-dispute-process,filing-a-content-idappeal,more-info-about-the-content-id-appeal-process. https://classicalcandor.blogspot.com/2020/06/classical-music-news-of-week-june-6-2020.html https://classicalcandor.blogspot.com/2020/06/classical-music-news-of-week-june-6-2020.html https://www.facebook.com/baltimoreschoolforthearts/posts/sometimes-you-have-to-fight-our-michael-sheppard-was-recently-giving-a-facebook-/3146142648740808/ https://www.facebook.com/baltimoreschoolforthearts/posts/sometimes-you-have-to-fight-our-michael-sheppard-was-recently-giving-a-facebook-/3146142648740808/ https://twitter.com/pianistcomposer/status/1259118489622777856 https://www.violinist.com/blog/laurie/20207/28375/ https://www.mirror.co.uk/tech/facebook-blocks-musician-uploading-performance-13208194 https://www.mirror.co.uk/tech/facebook-blocks-musician-uploading-performance-13208194 https://www.abc.net.au/classic/read-and-watch/music-reads/facebook-copyright/10633928 https://www.abc.net.au/classic/read-and-watch/music-reads/facebook-copyright/10633928 https://www.violinist.com/discussion/archive/27589/ https://support.google.com/youtube/answer/7000961?hl=en&ref_topic=9282678#zippy=,filing-a-content-id-dispute,more-info-about-the-content-id-dispute-process,filing-a-content-id-appeal,more-info-about-the-content-id-appeal-process https://support.google.com/youtube/answer/7000961?hl=en&ref_topic=9282678#zippy=,filing-a-content-id-dispute,more-info-about-the-content-id-dispute-process,filing-a-content-id-appeal,more-info-about-the-content-id-appeal-process https://support.google.com/youtube/answer/7000961?hl=en&ref_topic=9282678#zippy=,filing-a-content-id-dispute,more-info-about-the-content-id-dispute-process,filing-a-content-id-appeal,more-info-about-the-content-id-appeal-process information technology and libraries june 2022 classical musicians v. copyright bots | berkowitz 9 25 “copyright strike basics,” youtube help, accessed october 24, 2019, https://support.google.com/youtube/answer/2814000#zippy=,what-happens-when-you-geta-copyright-strike,resolve-a-copyright-strike. 26 google, how google fights piracy (november 2018), 14, https://www.blog.google/documents/27/how_google_fights_piracy_2018.pdf; joanne e. gray and nicolas p. suzor, “playing with machines: using machine learning to understand automated copyright enforcement at scale,” big data & society 7, no. 1 (april 2020): 1–15. 27 karl borgsmiller, “youtube vs. the music industry: are online service providers doing enough to prevent piracy?” southern illinois university law journal 43, no. 3 (spring 2019): 660. 28 google, how google fights piracy, 28–31. 29 youtube creators, copyright and content id on youtube, october 12, 2020, accessed december 11, 2021, https://www.youtube.com/playlist?list=plpjk416fmkwrnrbv72kshryeknnsaafkd; “frequently asked copyright questions,” youtube help, accessed october 24, 2019, https://support.google.com/youtube/answer/2797449#c-pd&zippy=,what-is-the-publicdomain. 30 “how long does copyright protection last?” copyright.gov, us copyright office, https://www.copyright.gov/faq/faq-duration.html. 31 adam j. reis and manon l. burns, “who owns that tune? issues faced by music creators in today’s content-based industry,” landslide 12, no. 3 (january & february 2020): 13–16. 32 maddy shaw roberts, “research shows huge surge in millennials and gen zers streaming classical music,” classic fm, august 19, 2020, https://www.classicfm.com/music-news/surgemillennial-gen-z-streaming-classical-music/. 33 berkowitz, “are youtube and facebook cancelling classical musicians?,” 199–201. 34 “frequently asked questions” copyright.gov, us copyright office, https://www.copyright.gov/help/faq. 35 steven m. davis, “computerized takedowns: a balanced approach to protect fair uses and the rights of copyright owners,” roger williams university law review 23, no. 1 (winter 2018): 1– 24. 36 berkowitz, “are youtube and facebook cancelling classical musicians?,” 177–202. 37 “frequently asked copyright questions,” youtube help. 38 “main page,” (website), imslp: petrucci music library, accessed december 12, 2021, https://imslp.org/wiki/main_page. 39 david berger and chuck israels, the public domain song anthology: with modern and traditional harmonization (charlottesville: aperio, 2020), https://aperio.press/site/books/m/10.32881/book2/. https://support.google.com/youtube/answer/2814000#zippy=,what-happens-when-you-get-a-copyright-strike,resolve-a-copyright-strike https://support.google.com/youtube/answer/2814000#zippy=,what-happens-when-you-get-a-copyright-strike,resolve-a-copyright-strike https://www.blog.google/documents/27/how_google_fights_piracy_2018.pdf https://www.youtube.com/playlist?list=plpjk416fmkwrnrbv72kshryeknnsaafkd https://support.google.com/youtube/answer/2797449#c-pd&zippy=,what-is-the-public-domain https://support.google.com/youtube/answer/2797449#c-pd&zippy=,what-is-the-public-domain https://www.copyright.gov/faq/faq-duration.html https://www.classicfm.com/music-news/surge-millennial-gen-z-streaming-classical-music/ https://www.classicfm.com/music-news/surge-millennial-gen-z-streaming-classical-music/ https://imslp.org/wiki/main_page https://aperio.press/site/books/m/10.32881/book2/ abstract introduction automated copyright enforcement the impact on classical musicians disputing a copyright claim conclusion endnotes research on knowledge organization of intangible cultural heritage based on metadata article research on knowledge organization of intangible cultural heritage based on metadata qing fan, guoxin tan, chuanming sun, and panfeng chen information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14093 qing fan (fanqmy@hotmail.com) is phd student, jingchu university of technology and central china normal university. guoxin tan (gxtan@mail.ccnu.edu.cn) is professor, central china normal university. chuanming sun (cms@ccnu.edu.cn) is assistant professor, central china normal university. panfeng chen (94388389@qq.com) is phd student, guizhou university. © 2022. abstract metadata has been analyzed and summarized. based on dublin core metadata, combined with the characteristics and forms of intangible cultural heritage, this article explores the metadata for intangible cultural heritage in knowledge organizations based on relevant resource description standards. the wuhan woodcarving ship model is presented as an example of national intangible cultural heritage to control the application of metadata in intangible cultural heritage knowledge organizations. new ideas are provided for the digital development of intangible cultural heritage. introduction intangible cultural heritage includes traditions or living expressions inherited from our ancestors and passed on to our descendants. digital storage and presentation of intangible cultural heritage resources is an inevitable requirement for the protection of china’s long history and its culture in the information age. with the rapid development of artificial intelligence and big data, all kinds o f massive data in the internet age are expanding, necessitating the development of a database platform for the inheritance and protection of intangible cultural heritage. at the same time, organizations must consider how to deal with the intangible cultural heritage using complex data. searching for data and visualizing the relationship with intangible cultural heritage is a current research hotspot. however, at this stage, there are still some problems in the construction of digital resources of intangible cultural heritage in china, such as the establishment of accurate and interoperable metadata. in this process, the diversity and uniqueness of intangible cultural heritage items needs to be fully considered, including the subsequent integration of digital resources and its existing digital resource system of intangible cultural heritage in china. therefore, the construction of the intangible cultural heritage resource database is not only to simply organize and list the data, but more importantly, to reveal the relationships between the knowledge content and resources in the intangible cultural heritage field and to build a thorough and relevant knowledge system. research status at home and abroad metadata is data that describes the attributes of a certain type of resource (or object). metadata can be used to locate and manage the resource and display information about it.1 metadata can also be structured data used to describe online information resources and strengthen the collection development, organization, and utilization of online information resources.2 from the perspective of knowledge organization, general metadata is used to describe the theme, content, and characteristics of information resources. the most common metadata format is dublin core mailto:fanqmy@hotmail.com mailto:gxtan@mail.ccnu.edu.cn mailto:94388389@qq.com information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 2 (dc) metadata, which is structured and descriptive. the creation of metadata standards in the field of intangible cultural heritage must first combine the basic concepts and characteristics of cultural heritage to extract specific attributes and provide element definitions that describe the basic characteristics of intangible cultural heritage resources, that is, core metadata. this is not easy to achieve since intangible cultural heritage is traditional art, music, folklore, etc. only by unifying intangible heritage resources of different expressions through metadata standards can a relatively standardized intangible cultural heritage resource library be formed. the visual resources association of america (vra) created the vra core metadata standard to describe art, architecture, prehistoric artifacts, folk culture, and other artistic visual resources in the network environment.3 in terms of intangible cultural heritage material, lan xuliu et al. proposed the vra core as the foundation format and added elements from the categories for the description of works of art (cdwa) as the extended element metadata format of digital cultural resources.4 a sculpture of abraham lincoln was used as the basis for the metadata format. the example explains the specific use method of the proposed metadata format in practice. the solution does not extend the core elements and there is an overall lack of flexibility as users cannot customize the required elements. b. murtha proposed a descriptive metadata architecture in the field of art and architecture, including the core category of ontology id, and added a controlled vocabulary and classification system in the field of art and architecture to enrich the specific metadata model.5 it is mainly based on the theoretical discussion of metadata standards in this field, and there is no specific practice, but its method of formulating metadata from the perspective of user retrieval effects is worth learning. yi junkai et al. proposed the core metadata specification for digital museums as the basis for expansion, implemented the relevant methods in the metadata expansion rules, and finally formed a special metadata specification. 6 this metadata specification system can guarantee the basic and personalized description of resources. the metadata specification was developed and completed by the national museum of china. to keep this specification consistent with the metadata description of other metadata specifications at home and abroad, the description method refers to the iso-11179 standard.7 the national museum metadata specification contains seven element sets, 60 elements, and 342 restricted elements. the seven metadata element sets are: collection resource entity, data resource entity, responsible entity, business entity, transaction entity, relationship entity, and save entity. each metadata element of the museum’s digital resources defines several elements according to the concept of hierarchical structure; each element is defined and described by a group of attributes, such as name, version, logo, definition, type, and value range. there are 11 attributes of necessity, repeatability, lower-level elements, application scope, and annotations. the establishment of the metadata standard framework for museum digital resources is based on the digitization of museum collection resources. collection resources are the core of museum work, and the content of museum collection resources is the core component of digital resources.8 the digital resources of these collections are related to communication, transmission, storage, or business activities. based on the characteristics of china’s existing intangible cultural heritage information resources, li bo proposed a compatible and interoperable metadata model. the description of intangible cultural heritage information resources was created on the basis of information structure and semantic component analysis.9 the ontological characteristics of each intangible cultural heritage information and related documents, characters, objects, spaces, and other entities are included in the construction of the intangible cultural heritage metadata model, which combines china’s nonmaterial cultural heritage. the actual situation of the tangible cultural heritage database has a information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 3 certain degree of international generality. ye peng compared the dc metadata standard system with the needs of china’s intangible cultural heritage protection, proposed a metadata standard based on intangible cultural heritage resources, and gave the scope of application. this metadata standard contains multiple core metadata, corresponding to the relevant elements in dc. 10 however, ye peng also pointed out that a major problem with this intangible cultural heritage metadata standard is that it is not compatible with china’s existing intangible cultural heritage database, such as information storage, digital mining, file retrieval, and multimedia distribution. connotation and design principles of intagible cultural heritage metadata connotation of intangible cultural heritage metadata the core metadata of this study will be designed based on the dc core metadata set, considering its versatility, scalability, easy conversion between metadata, interoperability between systems, and existing comparisons. universal dc metadata is the most influential and widely used metadata standard in the field of information resource description under the network environment. since the dc metadata standard is mainly aimed at the retrieval of network entity resources, it reveals common characteristics of digital entity resources but does not consider the cultural connotation and knowledge context of specific knowledge topics such as intangible cultural heritage.11 to reveal the originality of the object, the model proposed in this article will also combine the application and recording of china’s intangible cultural heritage items, reflecting the characteristics of specific intangible cultural heritage items, so as to facilitate compatibility and integration with existing information resources to form a unified interface standard with the existing intangible cultural heritage management system of the cultural sector, enabling the sharing of digital resources among cultural centers in different regions. design principles of intangible cultural heritage metadata the design of the metadata model of intangible cultural heritage information resources should be fully compatible with popular metadata standards. various metadata standards apply to different objects: dc is suitable for network resources, cdwa is suitable for artworks, and federal geographic data committee (fgdc) is suitable for geographic space. when it comes to digital collections, the national library of the netherlands was one of the first institutions in the world to respond, starting in 1994 with the decision to collect digital publications and working with publishers and it partners to make important contributions to digital collections research. the national library of the netherlands will develop a new global information network. the main approach of the system is to add dc data to all collected web pages. the new web page will require providers to add elements of the dc core set by themselves. once submitted, the national library of the netherlands’ search engine will use these dc elements to assist in retrieval. in recent years, the art museum community has adopted several metadata standards such as cdwa and vra core to describe their collections of art works. nam, y. j. and lee, s. m proposed a set of metadata elements customized to fit into the distinct context of smallscaled art museums in south korea.12 a small art museum in korea combines the existing cdwa, vra core, and dc standards and the proposed set of metadata elements is expected to support artistic resources. the metadata design of intangible cultural heritage resources should refer to the design cases of the netherlands and south korea. when applying the existing metadata standards, it is beneficial to fully reveal the characteristics of the described objects and decide whether to use the overall framework or the partial use, and must not be blindly used.13 information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 4 the design of the metadata should connect with existing intangible cultural heritage information sources. at present, china should refer to the relevant standards of world intangible cultural heritage digital resources and establish a management system for intangible cultural heritage resources in line with the national, provincial, municipal and county levels. the relevant cultural management functional departments have also established relevant information systems to form a unified set of authoritative and standardized data. therefore, in terms of the elements and concepts used in the metadata model, special attention should be paid to the connection with these existing data models, so that as new resources are developed, these rich information sources can be shared through the mapping relationship between the elements. the metadata model should have good scalability and strong descriptive ability containing more elements. therefore, an element-rich metadata model has a strong influence on the organization and management of information resources and content disclosure. data inspection should be flexible. conversely, a metadata model with a lack of elements will be less flexible when technology is upgraded or user description requirements are expanded. a metadata model requires constant expansion and modification, and the practicality of the model will be greatly reduced. on the other hand, the design of the metadata model should have a mechanism that facilitates different types of users to expand elements according to different needs. the design of metadata can show the relationship between intangible cultural heritage resource entities. with the development of information resource description technology at home and abroad, a batch of metadata standards for various types of information resources have been formed. the metadata standards for china’s intangible cultural heritage should aim for compatibility and integrate existing world standards based on current results. the metadata should further be expanded and developed in accordance with preserving intangible cultural heritage works. the metadata standards for intangible cultural heritage archives should describe resources while displaying the greatest degree of versatility, compatibility, and standardization. therefore, combining the requirements of cultural heritage archiving and the characteristics of intangible cultural heritage, the dc metadata standard is used as the basic standard, and the advantages of other metadata standards are combined to determine the metadata standard of china’s intangible cultural heritage archives. intangible cultural heritage knowledge organization definition of intangible cultural heritage metadata through semantic analysis, the core attributes and concepts involved in metadata can be obtained, and the specificity of attributes and concepts can be improved through metadata standards, which can make users’ cognition, retrieval, and evaluation of information more accurate and effective. at the same time, the normative concepts and common attributes in existing metadata schemes should be quoted as much as possible. according to the attribute characteristics of the object, close and similar conceptual entities can be selected from one or more common metadata schemes, so that the element definition has versatility and standardization. for intangible cultural heritage, according to the attributes and characteristics of the object, close and similar conceptual entities are selected from one or more general metadata schemes to make the element definition universal and normative. in the “convention for the safeguarding of intangible cultural heritage," unesco pointed out that the types of intangible cultural heritage include oral traditions, performing arts, social practices, festivals, traditional handicrafts. based on the above-mentioned definitions of intangible cultural heritage types and the previous comparative research results on metadata information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 5 standards of various countries, combined with the research results of scholars, the dc standard metadata name and standard affix library incorporates a set of intangible cultural heritage archive metadata containing 23 elements and extended elements (table 1). table 1. core metadata of intangible cultural heritage digital resources category standard metadata name field name annotation content title dc_title name and content of intangible cultural heritage category dc_category bintroduction dc_bintroduction creator mcreator dc_creator_own creator identity information nation dc_creator_nation sex dc_creator_sex age dc_creator_age area dc_creator_area biography dc_creator_biography category dance dc_category_dance heritage list category song dc_category_song literature dc_category_literature quyi dc_category_quyi art dc_category_art resources video dc_category_video resource type includes a description of resource content picture dc_resources_picture text dc_resources_text network dc_resources_network organization area dc_organization_area organization information principal dc_organization_principal officephone dc_organization_officephone jobtitle dc_organization_jobtitle introduction dc_organization_introduction intangible cultural heritage metadata standards unify the information format and mutual mapping relationship of intangible cultural heritage digital achievements. on the one hand, a single standard removes barriers to sharing metadata caused by having intangible cultural heritage information resources with different hardware, different platforms, and different formats. on the other, it enables the digital resources of intangible cultural heritage to be shared online. for example, the china intangible cultural heritage digital museum (https://www.ihchina.cn/) uses unified metadata to design this section, which solves the problem of integration and sharing of different resources. the smooth conversion between new and old data is beneficial to the protection of intangible cultural heritage inventory data, avoiding duplication of work, and improving the efficiency and effectiveness of intangible cultural heritage storage. in addition, design of intangible cultural https://www.ihchina.cn/ information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 6 heritage metadata standards must consider the versatility, compatibility, and individualization of the metadata system. description of digital resources of intangible cultural heritage through the analysis of the intangible cultural heritage project objects, we can provide content, management, resources, etc. these attributes can correspond to the elements of the metadata during the metadata design or serve as the semantic basis for the definition of the elements. the analysis and extraction of the core attributes and concepts of the object should firs t consider the full presentation of the object knowledge and resource content, and the concept should have a certain degree of specificity so that users can recognize, retrieve, and evaluate the information. secondly, it is important to refer to the normative concepts and general attributes in the existing metadata schemes as much as possible. according to the attribute characteristics of the object, select close and similar conceptual entities from one or more general metadata schemes, so that the element definition is versatile and standardized. therefore, the content description of intangible cultural heritage items should reflect unique cultural meanings and characteristics. at present, there are only general concepts such as name, category, subject, and region among several general metadata schemes. figure 1 shows the metadata framework of intangible cultural heritage. figure 1. metadata framework of intangible cultural heritage. in the content description, there are five elements which include names, types, subjects, regions, and protection levels as special attributes. in the metadata standard, the only elements that can be used in general are the name, subject, category, and region. the protection level means that the list of intangible cultural heritage is the object of national or provincial protection. the “national information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 7 intangible cultural heritage declaration form” uses the five elements of content description, and then conducts resource description analysis based on the information organization structure of the intangible cultural heritage project object to construct intangible cultural heritage. th e framework covers the main attributes and definitions involved in intangible cultural heritage objects, as well as their connections and hierarchical relationships. in the description framework, in addition to the attributes and definitions specified by the dc metadata, a set of custom elements is also set.14 without changing the basic structure, users can customize elements according to standard needs to make the model extensible. in the description of related resources, entities related to intangible cultural heritage are divided into four categories: inheritors, object categories, resources, and organizations. among them, the inheritor-related attributes include six general attributes such as name, ethnicity, gender, region, age, and person profile. object category attributes include dance, song, art, literature, video, network, etc. for intuitive objects, you can refer to the use of artistic works to describe the category or the core category of visual materials, and the documentation and materials can use the metadata defined by dc. this model does not specify the use of attributes and concepts in metadata. in a specific metadata solution, these attributes and concepts can correspond to metadata element names, or they can be modifiers, values, or metadata element definition s, such as the inheritor of shadow puppetry is lin shimin. data association linked data is a technical specification recommended by the world wide web consortium (w3c). the relationship among linked data objects supports a greater degree of resource sharing and utilization, enabling users to efficiently and accurately locate needed resources on a larger scale. the release of linked data is to describe the metadata of cultural resources in the form of resource description framework (rdf). after forming semantic associations, intelligent retrieval and data discovery services are provided on the intelligent application platform, so as to ensure the visual presentation and data sharing of intangible cultural heritage digital resources in knowledge organizations. linked data publishing provides standardized data access specifications. the biggest advantage is that it can correlate data across platforms and establish links to different data, which is convenient for users to search for data in different repositories. as far as the content of intangible cultural heritage is concerned, linked data presents unstructured, semi-structured, and structured data on the internet in the form of rdf. rdf description refers to the transformation of metadata in resources into rdf triples through data and relationship mapping, and the formation of w3c-supported documents through semantic relationship construction. visual presentation refers to the visual presentation of relevant content by users through network search with the support of the network architecture. in essence, the release of digital resource data is to realize the rdf description and sharing of metadata for intangible cultural heritage metadata by multiplexing the relationship. its essence is the management application process of the database. the linked data publishing process of intangible cultural heritage resources consists of three steps: (1) converting the metadata of the repository into an rdf triple model and assigning a uri identifier to form an rdf document of linked data; (2) establishing a semantic relationship and building relational links to form semantic associations; and (3) mapping cultural resource data to the network through the uri access mechanism, and presenting data search results in a visual way information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 8 through user sparq queries. although there are differences in the structure of different data publishing tools, metadata-based linked data publishing follows these basic steps. examples of metadata application of intangible cultural heritage knowledge organizations introduction to the wuhan wood carving ship model intangible cultural heritage project the wuhan woodcarving ship model is a unique art variety in chinese woodcarving craftsmanship, with a history of more than 2,000 years.15 according to the song dynasty’s the history of jin shi·zhang zhongyan: “the craftsman did not know how to build the ship. when the boat was built, the craftsmen did not know how to build it. the boat model made by zhang zhongyan. it was only a few inches long and was very delicate. the front and rear of the boat could be spliced well without glue. the other craftsmen were all amazed.” as early as the 12th century, there were people in china who could carve small boats several inches long as models for making ships. hubei woodcarving boats are a national intangible cultural heritage project but the art and craft faces challenges. like other intangible cultural heritage projects, development of the craftwork is weak. while younger generations in hubei may recognize the form of wooden carving boat, few are willing to learn this art and more young people have not even heard of it. in order to better honor this long-standing tradition, this article focuses on the characteristics of intangible cultural heritage digital resources, combined with the relevant theories of knowledge organization, an d adopts certain technical standards to organize the knowledge organization and construction of the metadata standards for hubei woodcarving ships. knowledge organization construction based on metadata to effectively use metadata in intangible cultural heritage, metadata specifications must be defined and described. rdf is metadata specification description language. it can semantically pay attention to the attributes of the ontology and the interrelationships between these attributes. by using rdf information, it can be easily exchanged between computers using different types of operating systems and application languages.16 rdf regulates the realization of semantics in a standardized and interoperable way. the web page can implement the invocation of rdf in a simple way, thereby facilitating the retrieval of network data and the discovery of related knowledge. in this paper, the metadata system needs to use rdf to define the attributes, so that it can be better transformed into a language that the computer can understand. intangible cultural heritage items have a certain relationship with inheritors, organizations, resource content, etc. in order to establish a complete intangible cultural heritage cultural resource database, these entities need to be described separately in rdf. wuhan woodcarving ship model metadata definition according to the rdf description, wuhan woodcarving ship model is used as a specific example to show the designed metadata scheme, that is, the relevant content of the example is filled into the defined resource description frame. for example, part of the rdf description of the wuhan woodcarving ship model intangible cultural heritage item can be found in the following code: information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 9 long congfa men wuhan, hubei family-level intangible cultural heritage project-inheritor of wuhan woodcarving ship model conclusion this article reviews the classification system of china’s intangible cultural heritage items and the integration of existing knowledge organizations and other types of resources for designing a set of more comprehensive and reasonable metadata standards with a certain degree of scalability and it is applied to the actual intangible cultural heritage knowledge organization. to effectively protect and use the digital resources of intangible cultural heritage, further research is needed for this study. additional discussion on updating and promoting existing metadata specifications as well as multidimensional aggregation of existing resources to achieve knowledge discovery is needed. through the integration of linked data and sharing existing digital resources, this article can encourage scholarship and conversation that leads to the preservation of china’s intangible cultural heritage. funding statement this work was supported by the hubei key laboratory of big data in science and technology. this work was also supported by the palace museum’s open project in 2021, research on the dissemination of intangible cultural heritage of the palace museum from the perspective of artificial intelligence. this subject has been funded by the mercedes-benz star wish fund of china youth foundation. information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 10 endnotes 1 feng xiangyun, xiao long, liao sansan, and zhuang jilin, “a comparative study of commonly used foreign metadata standards,” journal of university libraries 4 (2001): 15–21, https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2001&filename=d xts200104005&uniplatform=nzkpt&v=v9a8p-rcf4csl9yoaqskj5nbnfjrmwjhsaoj2pnqq9jl0tdsle3ntrjrzeto32h. 2 ma min, “metadata—the basic format for organizing online information resources,” information science 4 (2002): 377–79, https://kns.cnki.net/kcms/detail/detail.aspx? dbcode=cjfd&dbname=cjfd2002&filename=qbkx200204012&uniplatform=nzkpt&v=yem o5mxwo0mzg5mkz6qml62oruvfchtdy2slxdbn_hesfdvspxuc-naorq0v0ikl. 3 “specification data function requirements” november 24, 2014, http://eprints.rclis.org/13191/1/frad_2009-zh.pdf. 4 lan xuliu and meng fang, “metadata format analysis of digital cultural resources,” modern information 33, no. 8: 61–64, 102, https://kns.cnki.net/kcms/detail/detail.aspx? dbcode=cjfd&dbname=cjfd2013&filename=xdqb201308015&uniplatform=nzkpt&v=skct nh3sg04qrgzqahxdh3nj2hmpk2ppmjbp4ymnpdq-phf2ffjwxpp5vcns9qc9. 5 murtha baca, “practical issues in applying metadata schemas and controlled vocabularies to cultural heritage information,” cataloging & classification quarterly 36, no. 3–4 (2003): 47–55, https://doi.org/10.1300/j104v36n03_5. 6 yi junkai, zhou yubin, and chen gang, “research and practice of scalable digital museum metadata specification[j],” digital library forum 2 (2014): 43–53, https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdtemp&filename=s ztg201402011&uniplatform=nzkpt&v=tf76zueher7ymnfxdfafenmm2z2tetze08zqkdhoc7 wq2zwtkoao3i0ei7oyvcf1. 7 jin saiying, “research on chinese and foreign art image metadata and framework,” new art 37, no. 1 (2016): 129–32, https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname= cjfdlast2016&filename=xmsh201601019&uniplatform=nzkpt&v=eynvoucbcnpzjkw84 mxeabs--auqafuwanchem0p5phcmjw0s7jttnplobqop0_h. 8 xiao long and zhao liang, introduction and examples of chinese metadata (beijing: beijing library press, 2007). 9 li bo, “research on metadata model of intangible cultural heritage information resources,” library circle 5 (2011): 38–41, https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd& dbname=cjfd2011&filename=tsgu201105016&uniplatform=nzkpt&v=unflzsdezr0jue0ut _npb7h0ri5vioemybvm3zytqfh2quzuycubz5tzrbshnkwh. 10 ye peng and zhou yaolin, “the framework and standards of chinese intangible cultural heritage metadata,” 2013 international conference on applied social science research (paris: atlantis press, 2013). https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2001&filename=dxts200104005&uniplatform=nzkpt&v=v9a8p-rcf-4csl9yoaqskj5nbnfjrmwjhsaoj2pnqq9jl0tdsle3ntrjrzeto32h https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2001&filename=dxts200104005&uniplatform=nzkpt&v=v9a8p-rcf-4csl9yoaqskj5nbnfjrmwjhsaoj2pnqq9jl0tdsle3ntrjrzeto32h https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2001&filename=dxts200104005&uniplatform=nzkpt&v=v9a8p-rcf-4csl9yoaqskj5nbnfjrmwjhsaoj2pnqq9jl0tdsle3ntrjrzeto32h https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2002&filename=qbkx200204012&uniplatform=nzkpt&v=yemo5mxwo0mzg5mkz6qml62oruvfchtdy2slxdbn_hesfdvspxuc-naorq0v0ikl https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2002&filename=qbkx200204012&uniplatform=nzkpt&v=yemo5mxwo0mzg5mkz6qml62oruvfchtdy2slxdbn_hesfdvspxuc-naorq0v0ikl https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2002&filename=qbkx200204012&uniplatform=nzkpt&v=yemo5mxwo0mzg5mkz6qml62oruvfchtdy2slxdbn_hesfdvspxuc-naorq0v0ikl http://eprints.rclis.org/13191/1/frad_2009-zh.pdf https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=xdqb201308015&uniplatform=nzkpt&v=skctnh3sg04qrgzqahxdh3nj2hmpk2ppmjbp4ymnpdq-phf2ffjwxpp5vcns9qc9 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=xdqb201308015&uniplatform=nzkpt&v=skctnh3sg04qrgzqahxdh3nj2hmpk2ppmjbp4ymnpdq-phf2ffjwxpp5vcns9qc9 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=xdqb201308015&uniplatform=nzkpt&v=skctnh3sg04qrgzqahxdh3nj2hmpk2ppmjbp4ymnpdq-phf2ffjwxpp5vcns9qc9 https://doi.org/10.1300/j104v36n03_5 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdtemp&filename=sztg201402011&uniplatform=nzkpt&v=tf76zueher7ymnfxdfafenmm2z2tetze08zqkdhoc7wq2zwtkoao3i0ei7oyvcf1 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdtemp&filename=sztg201402011&uniplatform=nzkpt&v=tf76zueher7ymnfxdfafenmm2z2tetze08zqkdhoc7wq2zwtkoao3i0ei7oyvcf1 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdtemp&filename=sztg201402011&uniplatform=nzkpt&v=tf76zueher7ymnfxdfafenmm2z2tetze08zqkdhoc7wq2zwtkoao3i0ei7oyvcf1 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdlast2016&filename=xmsh201601019&uniplatform=nzkpt&v=eynvoucbcnpzjkw84mxeabs--auqafuwanchem0p5phcmjw0s7jttnplobqop0_h https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdlast2016&filename=xmsh201601019&uniplatform=nzkpt&v=eynvoucbcnpzjkw84mxeabs--auqafuwanchem0p5phcmjw0s7jttnplobqop0_h https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdlast2016&filename=xmsh201601019&uniplatform=nzkpt&v=eynvoucbcnpzjkw84mxeabs--auqafuwanchem0p5phcmjw0s7jttnplobqop0_h https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2011&filename=tsgu201105016&uniplatform=nzkpt&v=unflzsdezr0jue0ut_npb7h0ri5vioemybvm3zytqfh2quzuycubz5tzrbshnkwh https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2011&filename=tsgu201105016&uniplatform=nzkpt&v=unflzsdezr0jue0ut_npb7h0ri5vioemybvm3zytqfh2quzuycubz5tzrbshnkwh https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2011&filename=tsgu201105016&uniplatform=nzkpt&v=unflzsdezr0jue0ut_npb7h0ri5vioemybvm3zytqfh2quzuycubz5tzrbshnkwh information technology and libraries june 2022 research on knowledge organization of intangible cultural heritage | qing, tan, sun, and chen 11 11 bamo qubumo, guo cuixiao, yin hubin, and li gang, “customizing discipline-based metadata standards for digital preservation of living epic traditions in china: basic principles and challenges,” 2013 digital heritage international congress, https://ieeexplore.ieee.org/document/6744746. 12 y. j. nam and s. m. lee, “localization of metadata elements in the art museum community[j],” 충청문화연구 46, no. 2 (2012): 139–74. 13 bamo qubumo, c. guo, h. yin, et al., “customizing discipline-based metadata standards for digital preservation of living epic traditions in china: basic principles and challenges,” digital heritage international congress, ieee, 2014. 14 chao gojin, “unesco ethical principles for the protection of intangible cultural heritage: an introduction and comment [j],” inner mongolia social sciences (chinese version), 37, no. 05 (2016): 1–13, https://doi.org/10.14137/j.cnki. issn1003-5281.2016.05.00. 15 chen junxiu, “research on the mode of productive protection and utilization of intangible cultural heritage,” learning and practice 5 (2015): 118–23, https://kns.cnki.net/kcms/detail/ detail.aspx?dbcode=cjfd&dbname=cjfdlast2015&filename=xxys201505014&uniplatform= nzkpt&v=telpps4abo6-qidxtqjyu9a_hy0q6ukovi4x5nz8br-u33pzq6py2d1cshqlclnw. 16 zhao zhihui, “visual analysis of the evolution path and hot frontiers of cultural heritage digitization research,” library forum 2 (2013): 33–40, https://kns.cnki.net/kcms/detail/ detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=tsgl201302007&uniplatform=nzk pt&v=yezmntrx2f00eqvogxwtz5yehk3zz1dm8layjik4l1lmjvvjuq7gaiymloplnmiv. https://ieeexplore.ieee.org/document/6744746 https://doi.org/10.14137/j.cnki.%20issn1003-5281.2016.05.00 https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdlast2015&filename=xxys201505014&uniplatform=nzkpt&v=telpps4abo6-qidxtqjyu9a_hy0q6ukovi4x5nz8br-u33pzq6py2d1cshqlclnw https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdlast2015&filename=xxys201505014&uniplatform=nzkpt&v=telpps4abo6-qidxtqjyu9a_hy0q6ukovi4x5nz8br-u33pzq6py2d1cshqlclnw https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdlast2015&filename=xxys201505014&uniplatform=nzkpt&v=telpps4abo6-qidxtqjyu9a_hy0q6ukovi4x5nz8br-u33pzq6py2d1cshqlclnw https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=tsgl201302007&uniplatform=nzkpt&v=yezmntrx2f00eqvogxwtz5yehk3zz1dm8layjik4l1lmjvvjuq7gaiymloplnmiv https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=tsgl201302007&uniplatform=nzkpt&v=yezmntrx2f00eqvogxwtz5yehk3zz1dm8layjik4l1lmjvvjuq7gaiymloplnmiv https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfd2013&filename=tsgl201302007&uniplatform=nzkpt&v=yezmntrx2f00eqvogxwtz5yehk3zz1dm8layjik4l1lmjvvjuq7gaiymloplnmiv abstract introduction research status at home and abroad connotation and design principles of intagible cultural heritage metadata connotation of intangible cultural heritage metadata design principles of intangible cultural heritage metadata intangible cultural heritage knowledge organization definition of intangible cultural heritage metadata description of digital resources of intangible cultural heritage data association examples of metadata application of intangible cultural heritage knowledge organizations introduction to the wuhan wood carving ship model intangible cultural heritage project knowledge organization construction based on metadata wuhan woodcarving ship model metadata definition conclusion funding statement endnotes applying topic modeling for automated creation of descriptive metadata for digital collections article applying topic modeling for automated creation of descriptive metadata for digital collections monika glowacka-musial information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.13799 monika glowacka-musial (monikagm@nmsu.edu) is assistant professor/metadata librarian, new mexico state university library. © 2022. abstract creation of descriptive metadata for digital objects tends to be a laborious process. specifically, subject analysis that seeks to classify the intellectual content of digitized documents typically requires considerable time and effort to determine subject headings that best represent the substance of these documents. this project examines the use of topic modeling to streamline the workflow for assigning subject headings to the digital collection of new mexico state university news releases issued between 1958 and 2020. the optimization of the workflow enables timely scholarly access to unique primary source documentation. introduction digital scholarship relies on digital collections and data. in the influential book digital_humanities, anna burdick and her associates affirm that humanistic knowledge production depends on collection building and curation.1 access to historical documents and data resources is essential for the development of new research questions and methodologies.2 this project utilizes topic modeling to support building a digital collection of institutional news releases. it is one of the initiatives to apply digital technologies in the library workflows. new mexico state university news releases in response to a growing scholarly and public interest in original university press announcements, the digitization of past nmsu print news releases was approved in september 2013. sixty years of news releases starting from the late 1950s to the present were to be included. one of the arguments presented in justification of the project was that these institutional news briefs have a truly unique historical value. researchers view university press announcements as anchors in the history of nmsu and the region, particularly for dating events and initiatives. they also find official communications essential for studying the way the news was framed by participants and the university administration. historically, the relationships between the university and the local media had always been a major concern of college administrators: how to respect the freedom of the press, while ensuring responsible and factual journalism, and how to build an effective partnership that would benefit both sides?3 to address these questions, the administration early on established the college’s information services that have issued news releases about campus events, programs, and developments in the college’s research, teaching, and service. these formal news repor ts representing the perspective of the university have been regularly distributed to local and worldwide media for many decades. this collection has become one of the most popular primary sources documenting a history of the southwestern educational institution. mailto:monikagm@nmsu.edu information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 2 since the beginning of the digitization project, thousands of press releases had been scanned, described, and added to the digital collection. currently, the collection features press releases issued by the university between 1958 and 1974. there is still a lot to be done. the most timeconsuming element in the process is adding metadata, including library of congress subject headings, to individual news releases. with decreasing personnel, dwindling library resources, and competing work priorities, the progress on the project has slowed substantially. its revitalization requires a fresh, problem-solving approach that would allow for a significant reduction of time that catalogers spend on metadata creation. in search for a viable solution, topic modeling, a computational tool for classifying large collections of texts, was put to the test and generated promising results. the following sections describe the tools, data, and process created for this experiment in some detail. topic modeling and its applications topic modeling (tm) is one of the methodologies used in natural language processing (nlp). it was specifically designed for text mining and discovering hidden patterns in huge collections of documents, images, and networks.4 according to practitioners, topic modeling is best viewed as a statistical tool for text exploration and open-ended discovery.5 it has been used extensively in computer science, genetics, marketing, political science, journalism, and digital humanities f or the last two decades. a growing literature on topic modeling applications provides clear evidence of its viability.6 examples of tm applications in digital social sciences and humanities include finding geographic themes from gps-associated documents on social media platforms such as flickr and twitter,7 selecting news articles on opposition to euro currency from financial times data,8 identifying paragraphs on epistemological concerns in english and german novels ,9 tracking research trends in different disciplines,10 and revealing dominant themes in newspapers,11 governance literature,12 and wikipedia entries.13 topic modeling was applied in addition to text mining to enhance access to large digital collections by providing minimal description and enriching metadata, including subject headings .14 also, a possibility of using topic modeling to determine the subject headings for books on project gutenberg was explored.15 topic modeling in a nutshell topic models help to identify the contents of document collections. topic modeling is a process of discovering clusters of words that best represent a set of topics. figure 1 shows the basic idea behind topic modeling. a large collection of text documents (the scrolls on top) consists of thousands of words (shown symbolically at the bottom). the algorithm seeks for the most frequent words that tend to occur in proximity and clusters them together. each cluster, referred to as a topic, has a set of words with their probabilities of belonging to a given topic. each document in the collection displays a set of combined topics to different degrees. here, documents are seen as mixtures of topics, and topics are seen as mixtures of words.16 topics also provide context to words. documents that have similar combinations of topics tend to be related. as a result, a large collection of text documents can be represented by a limited set of topics (as presented by icons in the middle of the figure). information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 3 figure 1. basic idea behind topic modeling. topics and subject headings combined the original purpose of topic modeling, as formulated by david blei and his associates in 2003, was to make large collections of texts more approachable for scholars by organizing texts automatically based on latent topics.17 these hidden topics can be discovered, measured, and consequently used by scholars to navigate the collection. the purpose of assigning subject headings is to identify “aboutness,” or simply subject concepts, covered by the intellectual content of a given work, and then again collocate related works. 18 since both topic models and subject headings have a similar purpose, although very different methodology and scale, we decided to combine them and make topic models a prerequisite for assigning subject headings. in such a scenario, the computer deals with the scale of text collections that are beyond human reading capacity and catalogers then fine-tune the results generated by the algorithm. the following methods section shows subsequent stages involved in the process of semiautomated assignment of subject headings to documents. methods overview for topic modeling, we used the algorithm of latent dirichlet allocation (lda).19 lda takes a document-term matrix, with rows corresponding to documents, and columns corresponding to terms (words) and, based on semirandom exploration, finds optimal probabilities of topics in documents (called gammas), and probabilities of terms in topics (called betas). after lda generates a set of topics that best represent the collection of news releases, each topic is associated with several subject headings that were previously assigned to news releases by catalogers. for a new news release, lda finds a set of most representative topics. subject headings information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 4 associated with the dominant topics are combined into a list of subject candidates presented to a cataloger. the last step involves a cataloger using a short list of subject candidates for selecting subject headings for news releases. training data training data used in this project consists of over 6,000 news releases (from 1958 to 1967) annotated with metadata. only two metadata properties—titles and subject headings—were considered. created by catalogers, both properties reflect the content of news releases accurately, although mistakes may sometimes happen. the values from the titles field were converted into a document-term matrix that, in turn, became an input for the algorithm. texts produced by ocr on original news releases were not included in the analysis due to their poor quality. detailed steps of the proposed method: 1. topic modeling on training data: a. run standard preprocessing of training text data, including tokenization, stop words removal, and stemming. b. run topic modeling (lda) where each document from the training data set is assigned a set of topics (subsets of words), each one with a measurable contribution to the document. 2. assignment of subject headings to topics.20 for each topic: a. select a number of documents with the highest probability (gamma) for the topic. we used 400. b. gather a set of subject headings assigned to documents produced in 2.a. and arrange them with decreasing frequency (freq) of occurrence in the set. 3. assignment of subject headings to a new document. a. assign to the new document gammas (probabilities) of topics using the lda model trained in 1.b. b. in subsequent topics, for each subject heading calculate its weight in the document as a product of its frequency in the topic (freq) and probability of the topic (gamma) in the document; for subject headings duplicated across topics, sum up their weights across topics. c. create a list of candidate subject headings processed in 3.b. in descending order with respect to their weights in the document. implementation there is a growing number of tools used for topic modeling.21 for this project, we used the r programming language, which has many packages for data preprocessing and topic modeling (tm).22 below are listed r packages used for this project: • topicmodels with functions: lda() producing topic models, posterior() for assigning topics to test documents by pretrained models and perplexity() for perplexity calculation 23 • tidytext with tidying functions that allow for re-arrangements and exploring data as well as for interpreting the models • textstem for preprocessing data, including stemming and lemmatization information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 5 • tidyr, dplyr, and stringr for data and strings manipulation and arrangements • ggplot2 for data visualizations the code related to topic modeling was mostly reused from the datacamp class on topic modeling.24 occasionally, data.table data structure was applied instead of data.frame. in addition to standard stop-words, custom stop-words including initials, names of weekdays, and dates were removed from the corpus using function anti_join(). for finding topics in test documents by a pretrained model, function posterior() from the r package topicmodels was used.25 the extra step needed before using function posterior() was to align the new document with the document-term matrix used for training the lda model.26 results for assessing the method’s performance, we adopted the idea of recall. in this specific context, recall is defined as the fraction of original subject headings (i.e., those assigned to a document manually by a cataloger) that are present on the list of candidate subject headings produced by the method. the average recall is estimated using a leave-one-out setting.27 once a single test document is set aside, the lda model is trained on the remaining documents and recall is calculated for the tested document using the list of candidate subject headings produced by the metho d. then, recall is averaged over a set of testing documents. this approach produces an estimate of the method’s performance if tested on a new document. information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 6 figure 2. average recall as a function of size of list with subject headings candidates. figure 2 shows the dependence of average recall on length of list of candidate subjects produced by the method. recall is averaged over 1,500 randomly selected test documents. the dashed line represents the chance level performance, i.e., when the method would produce a random subset of all subject headings available in the data. on a list of 100 suggested subject headings, the recall is on average above 0.6 and for a list of 500 candidate subject headings, above 0.8. even though the average recall stays noticeably below 1 (recall value 1 would mean perfect performance), at the same time it is still considerably above the chance level. the results presented in figure 2 were produced by the lda model trained with 16 topics. one of the parameters affecting the method performance is the number of topics used by the lda model. for finding the number of topics corresponding to the highest recall, an overall measure of recall across different lengths of the subject candidate list was defined as the cum ulative recall for first 100 subject candidates. we assumed that 100 is a likely size of candidate lists that catalogers would be willing to go through. figure 3 shows the cumulative recall for different numbers of topics, based on which 16 were chosen as the optimum. interestingly, this corresponds well with the perplexity dependence on number of topics (fig. 4). the perplexity, a measure of model’s surprise at the data, shows how the model fits the data—a smaller number means a better fit, i.e., a better topic model.28 information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 7 figure 3. cumulative recall as a function of number of topics in the lda model. information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 8 figure 4. perplexity of the lda model as a function of number of topics in the lda model. to give a better idea about the method performance, figure 5 shows the distribution of recall for individual test documents, for a list of 100 subject headings. since most documents in the training data have just a few subject headings, there is only a small set of discrete values possible for recall for individual documents. the distribution is wide, with a fraction of documents with no subject heading present on the proposed list (recall = 0) but also with a bigger fraction of documents fully covered by the list (recall = 1). information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 9 figure 5. distribution of recall across 1,500 test documents, for 100 subject candidates (for 16 topics). the following examples show the sets of subject headings selected by the algorithm that include subject headings (in bold blue) chosen originally by catalogers. example 1 title of news release: “‘romeo and juliet’ play part of campus celebration for 400th anniversary of shakespeare's birth” subjects weights new mexico state university. playmakers 0.280 theater 0.143 students 0.080 academic achievement 0.080 theater--production and direction 0.075 high school students 0.052 competitions 0.048 new mexico state university. college of engineering 0.042 plays 0.041 debates and debating 0.038 new mexico state university. aggie forensic festival 0.036 information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 10 zohn, hershel 0.034 shakespeare, william, 1564-1616. a midsummer night's dream 0.034 forensics (public speaking) 0.034 frisch, max, 1911-1991. firebugs 0.027 tickets 0.027 theater rehearsals 0.027 new mexico state university. college of agriculture and home economics 0.022 shakespeare, william, 1564-1616. romeo and juliet 0.020 frisch, max, 1911-1991 0.020 performances 0.020 garcia lorca, federico, 1898-1936. casa de bernarda alba. english 0.020 molière, 1622-1673. bourgeois gentilhomme. english 0.020 anniversaries 0.014 new mexico state university. college of teacher education 0.012 example 2 title of caption to photo: “locals barbara gerhard, donna herron, lillian jean taylor rehearse for upcoming concert” subjects weights concerts 0.123 new mexico state university. university-civic symphony orchestra 0.085 institution. playmakers 0.077 united states. air force rotc 0.073 united states. army. reserve officers' training corps 0.062 military cadets 0.058 award presentations 0.054 theater 0.039 award winners 0.038 scholarships 0.035 music 0.035 musicians 0.031 awards 0.027 new mexico state university. department of military science 0.023 theater--production and direction 0.021 kennecott copper corporation 0.019 students 0.019 glowacki, john 0.019 new mexico state university symphonic band 0.015 new mexico state university. university-community chorus 0.015 lynch, daniel 0.015 drath, jan 0.015 performances 0.015 information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 11 military art and science 0.012 united states. army--inspection 0.012 discussion the major advantage of the method described above is reducing a long list of library of congress subject headings that catalogers need to consult before assigning subject headings to news releases. it is important to note that this method produces subject headings that are already present in the training data. the list of available subject headings can be expanded by periodic updates of the training data to include all entries in the catalog, assuming catalogers will add, where needed, subjects not present so far in the data set. in this project we utilized metadata from just two fields: titles and subject headings. although documents’ titles are supposed to compactly represent the content of documents, we expect that the presented approach would give better results if the full text (ocr) was analyzed. in this project, the limiting factors were both quality of print copies and robustness of available ocr tools. in some cases, subject annotations are imperfect, depending on skills and experience of catalogers. that also affects the performance of our method that relies on quality of subject assignments. on the other hand, there are cases when the method suggests subjects that are fitting the content of news releases but were not selected by catalogers. this indicates that the method can also be used to refine the existing annotations. conclusion we propose a way to streamline the workflow of metadata creation for university news releases by applying topic modeling. first, we use this digital technology to identify topics in a large collection of text documents. then, we associate the discovered topics with sets of subject headings. finally, to a new document, we assign those subject headings that are associated with the document’s most dominant topics. the proposed method facilitates the process of document annotation. it produces short lists of candidate subject headings that account for a significant part of original labeling performed by catalogers. this approach can be applied to support annotation of any large digital collection of text documents. one of the advantages of applying topic modeling is that it produces numeric representations of text documents. these numeric representations can be used by advanced analytical methodologies, including machine learning, for numerous practical purposes in library workflows like text categorization, collocation of similar materials, enhancing metadata for digital collections, finding trends in government literature, etc. in addition, mastering digital methodologies by librarians may open new ways of collaboration among them and digital scholars across university campuses. as johnson and dehmlow argue, “... digital humanities represent a clear opportunity for libraries to offer significant value to the academy, not only in the areas of tool and consultations, but also in collaborative expertise that supports workflows for librarians and scholars alike.”29 digital technologies are best learned in hands-on practice. if librarians are to contribute to the development of digital scholarship, then information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 12 they need to learn how to apply new technologies to their own work. and since both librarians and humanists work with texts, they might have much to offer each other. endnotes 1 anne burdick et al., digital_humanities (cambridge, massachusetts: the mit press, 2012), 32–33. 2 thomas g. padilla, “collections as data implications for enclosure,” acrl news 79, no. 6 (2018), https://crln.acrl.org/index.php/crlnews/article/view/17003/18751; rachel wittmann, anna neatrour, rebekah cummings, and jeremy myntti, “from digital library to open datasets: embracing a ‘collections as data’ framework,” information technology and libraries 38, no. 4 (december 2019), https://doi.org/10.6017/ital.v38i4.11101. 3 gerald w. thomas, academic ecosystem: issues emerging in a university environment (gerald w. thomas, 1998), 159–64. 4 david m. blei, andrew ng, and michael jordan, “latent dirichlet allocation,” journal of machine learning research 3, no. 1 (2003); david m. blei, “topic modeling and digital humanities,” journal of digital humanities 2, no. 1 (winter 2012), http://journalofdigitalhumanities.org/21/topic-modeling-and-digital-humanities-by-david-m-blei/. 5 megan r. brett, “topic modeling: a basic introduction,” journal of digital humanities 2, no. 1 (winter 2012), http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basicintroduction-by-megan-r-brett/; jordan boyed-graber, yuening hu, and david mimno, “applications of topic models,” foundations and trends® in information retrieval 11, no. 2–3 (2017): 143–296. 6 boyed-graber, hu, and mimno, “applications of topic models,” foundations and trends® in information retrieval 11, no. 2–3 (2017): 143–296; rania albalawi, tet hin yeap, and morad benyoucef, “using topic modeling methods for short-text data: a comparative analysis,” frontiers in artificial intelligence 3 (2020): 42, https://doi.org/10.3389/frai.2020.00042; hamed jelodar, yongli wang, chi yuan, xia feng, “latent dirichlet allocation (lda) and topic modeling: models, applications, a survey,” (2017), https://www.ccs.neu.edu/home/vip/teach/dmcourse/5_topicmodel_summ/notes_slides/lda _survey_1711.04305.pdf. 7 zhijun yin et al., “geographical topic discovery and comparison,” in www: proceedings of the 20th international conference on the world wide web (2011), https://doi.org/10.1145/1963405.1963443. 8 david andrzejewski and david buttler, “latent topic feedback for information retrieval,” in kdd '11: proceedings of the 17th acm sigkdd international conference on knowledge discovery and data mining (2011), https://dl.acm.org/doi/10.1145/2020408.2020503. 9 matt erlin, “topic modeling, epistemology, and the english and german novel,” cultural analytics 1, no. 1 (may 1, 2017), https://doi.org/10.22148/16.014. 10 cassidy r. sugimoto et al., “the shifting sands of disciplinary development: analyzing north american library and information science dissertations using latent dirichlet allocation,” journal of the american society for information science and technology 62, no. 1 (january https://crln.acrl.org/index.php/crlnews/article/view/17003/18751 https://doi.org/10.6017/ital.v38i4.11101 http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-david-m-blei/ http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-david-m-blei/ http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/ http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/ https://doi.org/10.3389/frai.2020.00042 https://www.ccs.neu.edu/home/vip/teach/dmcourse/5_topicmodel_summ/notes_slides/lda_survey_1711.04305.pdf https://www.ccs.neu.edu/home/vip/teach/dmcourse/5_topicmodel_summ/notes_slides/lda_survey_1711.04305.pdf https://doi.org/10.1145/1963405.1963443 https://dl.acm.org/doi/10.1145/2020408.2020503 https://doi.org/10.22148/16.014 information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 13 2011), https://doi.org/10.1002/asi.21435; david mimno, “computational historiography: data mining in a century of classics journals,” journal on computing and cultural heritage 5, no. 1 (april 2012): 3:1–3:19; andrew j. torget and jon christensen, “mapping texts: visualizing american historical newspapers,” journal of digital humanities 1, no. 3 (summer 2012), http://journalofdigitalhumanities.org/1-3/mapping-texts-project-by-andrew-torgetand-jon-christensen/; andrew goldstone and ted underwood, “the quiet transformations of literary studies: what thirteen thousand scholars could tell us,” new literary history 45, (2014): 359–84; carlos g. figuerola, francisco javier garcia marco, and maria pinto, “mapping the evolution of library and information science (1978–2014) using topic modeling on lisa,” scientometrics 112, (2017): 1507–35, https://doi.org/10.1007/s11192-017-2432-9; jung sun oh and ok nam park, “topics and trends in metadata research,” journal of information science theory and practice 6, no. 4 (2018): 39–53; manika lamba and margam madhusudhan, “metadata tagging of library and information science theses: shodhganga (2013–2017),” paper presented at etd 2018: beyond the boundaries of rims and oceans globalizing knowledge with etds, national central library, taipei, taiwan, https://doi.org/10.5281/zenodo.1475795; manika lamba and margam madhusudhan, “author-topic modeling of desidoc journal of library and information technology (2008– 2017), india,” library philosophy and practice (2019): 2593, https://digitalcommons.unl.edu/libphilprac/2593. 11 david j. newman and sharon block, “probabilistic topic decomposition of an eighteenthcentury american newspaper,” journal of the american society for information science and technology 57, no. 6 (april 1, 2006): 753–67; robert k. nelson, “mining the dispatch,” last modified november 2020, https://dsl.richmond.edu/dispatch/about; tze-i yang, andrew torget, and rada mihalcea, “topic modeling on historical newspapers,” in latech '11: proceedings of the 5th acl-hlt workshop on language technology for cultural heritage, social sciences, and humanities (2011), https://dl.acm.org/doi/10.5555/2107636.2107649; carina jacobi, wouter van atteveldt, and kasper welbers, “quantitative analysis of large amounts of journalistic texts using topic modelling,” digital journalism 4, no. 1 (2015), https://doi.org/10.1080/21670811.2015.1093271. 12 jonathan o. cain, “using topic modeling to enhance access to library digital collections,” journal of web librarianship 10, no. 3 (2016): 210–25, https://doi.org/10.1080/19322909.2016.1193455; alexandra lesnikowski et al., “frontiers in data analytics for adaptation research: topic modeling,” wires climate change 10, no. 3 (2019): e576, https://doi.org/10.1002/wcc.576. 13 tiziano piccardi and robert west, “crosslingual topic modeling with wikipda,” in proceedings of the web conference 2021 (www ’21), april 19–23, 2021, ljubljana, slovenia (acm, new york), https://doi.org/10.1145/3442381.3449805. 14 cain, “using topic modeling to enhance access to library digital collections,” 210 –25; a. krowne and m. halbert, “an initial evaluation of automated organization for digital library browsing,” in jcdl '05: proceedings of the 5th acm/ieee-cs joint conference on digital libraries, (june 7–11, 2005): 246–255; david newman, kat hagedorn, and chaitanya chemudugunta, “subject metadata enrichment using statistical topic models,” paper https://doi.org/10.1002/asi.21435 http://journalofdigitalhumanities.org/1-3/mapping-texts-project-by-andrew-torget-and-jon-christensen/ http://journalofdigitalhumanities.org/1-3/mapping-texts-project-by-andrew-torget-and-jon-christensen/ https://doi.org/10.1007/s11192-017-2432-9 https://doi.org/10.5281/zenodo.1475795 https://digitalcommons.unl.edu/libphilprac/2593 https://dsl.richmond.edu/dispatch/about https://dl.acm.org/doi/10.5555/2107636.2107649 https://doi.org/10.1080/21670811.2015.1093271 https://doi.org/10.1080/19322909.2016.1193455 https://doi.org/10.1002/wcc.576 https://doi.org/10.1145/3442381.3449805 information technology and libraries june 2022 applying topic modeling for automated creation of descriptive metadata | glowacka-musial 14 presented at acm ieee joint conference on digital libraries jcdl’07, vancouver, bc, june 17– 22, 2007. 15 craig boman, “an exploration of machine learning in libraries,” ala library technology report 55, no. 1 (january 2019): 21–25. 16 julia silge and david robinson, text mining with r: a tidy approach (sebastopol, california: o’reilly media, inc., 2017), 90. 17 blei, ng, and jordan, “latent dirichlet allocation.” 18 arlene g. taylor, introduction to cataloging and classification, 10th ed. (westport, connecticut: libraries unlimited, 2006), 19–20, 301–14; arlene g. taylor and daniel n. joudrey, the organization of information, 3rd ed. (westport, connecticut: libraries unlimited, 2009), 303– 28. 19 blei, ng, and jordan, “latent dirichlet allocation.” 20 silge and robinson, text mining with r, 149. 21 albalawi, yeap, and benyoucef, “using topic modeling methods for short-text data,” 42. 22 the r project for statistical computing, https://www.r-project.org/. 23 bettina grün and kurt hornik, “topicmodels: an r package for fitting topic models,” journal of statistical software 40, no. 13 (2011): 1–30, https://doi.org/10.18637/jss.v040.i13. 24 topic modeling in r (datacamp), https://learn.datacamp.com/courses/topic-modeling-in-r. 25 grün and hornik, “topicmodels.” 26 topic modeling in r (datacamp), chap. 3, https://learn.datacamp.com/courses/topic-modelingin-r. 27 christopher m. bishop, pattern recognition and machine learning (new york, ny: springer science + business media, 2006), 32–33. 28 blei, ng, and jordan, “latent dirichlet allocation.” 29 daniel johnson and mark dehmlow, “digital exhibits to digital humanities: expanding the digital libraries portfolio,” in new top technologies every librarian needs to know, ed. kenneth j. varnum, (chicago: ala neal-schuman, 2019), 124. https://www.r-project.org/ https://doi.org/10.18637/jss.v040.i13 https://learn.datacamp.com/courses/topic-modeling-in-r https://learn.datacamp.com/courses/topic-modeling-in-r https://learn.datacamp.com/courses/topic-modeling-in-r abstract introduction new mexico state university news releases topic modeling and its applications topic modeling in a nutshell topics and subject headings combined methods overview training data detailed steps of the proposed method: implementation results example 1 example 2 discussion conclusion endnotes rarely analyzed: the relationship between digital and physical rare books collections article rarely analyzed the relationship between digital and physical rare books collections allison mccormack and rachel wittmann information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.13415 allison mccormack (allie.mccormack@utah.edu) is the original cataloger for special collections, university of utah, university of utah. rachel wittmann (rachel.wittmann@utah.edu) is the digital curation librarian, university of utah. © 2022. abstract the relationship between physical and digitized rare books can be complex and, at times, nebulous. when building a digital library, should showcasing a representative slice of the physical collection be the goal? should stakeholders focus on preservation, high-use items, or other concerns? to explore these conundrums, a special collections librarian and a digital services librarian performed a comparative analysis of their library’s physical and digital rare books collections. after exporting marc metadata for the rare books from their ils, the librarians examined the place of publication, publication date, and broad subject range of the collection. they used this data to create a variety of visualizations with the open-source digital humanities tool tableau public. next, the authors downloaded the rare books metadata from the digital library and created illuminating data visualizations. were the geographic, temporal, and subject scopes of the digital library similar to those of the physical rare books collection? if not, what accounted for the differences? the implications of these and other findings will be explored. introduction as of august 2019, the special collections division of the university of utah j. willard marriott library held over 256,000 printed works and archival collections. approximately 22% of the collection, or just over 55,000 works, belongs to the rare books department (https://lib.utah.edu/collections/rarebooks/), which contains not only books but serials, maps, manuscripts, ephemera, and other formats. the collection covers over 4,000 years of human history, with its earliest piece, a cuneiform tablet, dating to the mid-twenty-third century bce; contains works from nearly 100 different countries; and represents a wide variety of topics, including the exploration and settlement of the american west and the history of the book. the rare books department, a subset of special collections, specifically seeks to document the history of written human communication and actively collects historical items to enhance teaching and research at the university of utah. the marriott library has been adding digitized works from the rare books department to its digital library (https://collections.lib.utah.edu/) for over 25 years. approximately 780 works, or 1.42% of the rare books collection, has been digitized to date. however, no formal collection development plan was ever written, and rare books were selected for digitization by both curators and patrons. unfortunately, the reason a particular item was digitized is not recorded in the system: it is unclear if age, research value, physical condition, a desire to bring forward underrepresented stories, or a combination of these and other factors influenced the decision to digitize a rare book. this piecemeal approach to digital library collection development, while not uncommon, made it difficult for library staff and patrons to determine the relationship between the digital and physical collections of rare books. it also presented challenges when library staff mailto:allie.mccormack@utah.edu mailto:rachel.wittmann@utah.edu https://lib.utah.edu/collections/rarebooks/ https://collections.lib.utah.edu/ information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 2 attempted to communicate the scope and intent of the digital library to patrons, who assumed that the digitized items were representative of the overall collection. given their expertise in library metadata, the authors decided to analyze both traditional library catalog records and digital library records for the rare books collection and explore whether the digital collection was proportionally representative of the physical collection or if it differed in geographic, temporal, or subject scope in a meaningful way. they then created a series of data visualizations to better communicate information about the library’s rare books holdings. literature review while much has been written about methods and criteria for selecting special collections items to be digitized and the effects of digitization on collection accessibility, few authors have discussed the relationships between digital collections and the physical collections from which they were sourced. in their highly detailed treatise on selection strategies for digitization, ooghe and moreels identify representativity, a method that “aims for a final selection that provides a representative view of the original collections,” as one of 25 selection criteria for digitization projects.1 however, alexandra mills notes that “without a thorough understanding of the institution and collections, it is impossible to create truly representative collections.”2 because many digitization initiatives are undertaken in response to user requests, preservation concerns, or the availability of projectbased funding, it is likely that most libraries do not plan for their digital collections to be representative of their overall special collections holdings. as peter michel states, the digital collections at the university of nevada, las vegas, were explicitly built with popular history and popular culture in mind and were never intended to be “surrogates of the collection.”3 bradley daigle of the university of virginia explained that digitization could be undertaken to alleviate preservation concerns, respond to defined research needs, or to brand certain online content, but this approach could give the mistaken impression “that only the important materials are digitized.”4 despite the gaps in the literature, having an explicit collection development policy is still considered paramount; indeed, it is the very first principle listed in the national information standards organization (niso)’s framework for building “good” digital collections.5 to investigate this type of documentation further, a google search was employed using the search term “digital collection development policy site:edu”. this yielded 10 publicly accessible digital collection development policies from academic libraries in the united states: 6 • amherst college library (https://www.amherst.edu/library/services/digital/digitalcolldev) • emerson college archives and special collections (https://www.emerson.edu/policies/digital-collections-development-policy) • colorado state university libraries (https://lib.colostate.edu/digital-collectiondevelopment-policy/) • florida atlantic university digital library (https://library.fau.edu/policy/digital-librarycollection-development-policy) • georgetown university library (https://www.library.georgetown.edu/digital-projectpolicy) • northern illinois university digital library (https://digital.lib.niu.edu/policy/collectiondevelopment-policy) https://www.amherst.edu/library/services/digital/digitalcolldev https://www.emerson.edu/policies/digital-collections-development-policy https://lib.colostate.edu/digital-collection-development-policy/ https://lib.colostate.edu/digital-collection-development-policy/ https://library.fau.edu/policy/digital-library-collection-development-policy https://library.fau.edu/policy/digital-library-collection-development-policy https://www.library.georgetown.edu/digital-project-policy https://www.library.georgetown.edu/digital-project-policy https://digital.lib.niu.edu/policy/collection-development-policy https://digital.lib.niu.edu/policy/collection-development-policy information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 3 • oregon health and sciences university digital collections (https://www.ohsu.edu/library/ohsu-digital-collections-development-policy) • university of north texas university libraries (https://library.unt.edu/policies/collection-development-digital-collections/) • wesleyan university digital library (https://digitalcollections.wesleyan.edu/about/whatwe-collect) • williams college special collections (https://specialcollections.williams.edu/collectiondevelopment-policies/digital-collections/) in reviewing the sample of 10 universities’ digital collection development policies, homogenous content becomes apparent. almost all of the policies include a mission statement, scope, and selection criteria for potential digital collection items. all policies include criteria that physical materials should meet in order to qualify for digitization. the most common criteria for digitization are materials that are rare or unique, high-use, fragile, important to institutional or regional history, and/or support campus curriculum or faculty research. in addition, the clearance to publish materials online is ubiquitous among the policies. materials eligible for online display must either be in the public domain or intellectual property rights are held by the institution, and materials currently under copyright must receive permission from the copyright holder. a measured approach to digitization qualification has been employed by the university of north texas (unt) libraries’ digital collections and the northern illinois university digital library (niudl). unt libraries’ digital collections policy lists levels of criteria that materials must meet in order to be digitized and included in the digital library; to qualify for digitization, all criteria on level one must be met while only one criterion from level two is needed. niudl includes a priority factor rubric which includes criteria categories and corresponding numerical scale with a maximum point of 35, the higher value signifying an elevated priority. six of the 10 policies include prioritizing materials that support diversity and inclusion missions on campus. amherst college has leveraged their digital collection development policy to include content that would increase perspectives of underrepresented groups within the digital collections and traditionally underrepresented groups more broadly. niudl includes marginalized groups as a collection priority area in order to “deepen public understanding of the histories of people of color and other communities and populations whose work, experiences, and perspectives have been insufficiently recognized or unattended” and lists over 20 such groups. the collection candidate’s relationship to other collections is outlined in four of the 10 policies. georgetown university requires that “the materials form a coherent collection, fill gaps in existing collections, or complement existing collection strengths.” amherst college evaluates whether digitization would “enhance public awareness of archives’ collection strengths.” another function of a digital collection development policy is to inform the public on the scope and provenance of contents in their digital library. the unt digital collection policy includes a section outlining the content contributors, including partners, which can be beneficial for large-scale digital libraries that host collections from multiple partners. unt is also exemplary in defining collection curators and their responsibilities while underscoring the nature of this role, likely changing over time and not set to an individual. with no written digital collection development policy regarding special collections at the marriott library, the authors would first have to analyze both the physical and digital special collections before determining what factors may have influenced the digitization of these materials. libraries are gathering massive amounts of data, ranging from the metadata of their varied collections to patron usage statistics of both physical and digital collections. interpretation of the https://www.ohsu.edu/library/ohsu-digital-collections-development-policy https://library.unt.edu/policies/collection-development-digital-collections/ https://digitalcollections.wesleyan.edu/about/what-we-collect https://digitalcollections.wesleyan.edu/about/what-we-collect https://specialcollections.williams.edu/collection-development-policies/digital-collections/ https://specialcollections.williams.edu/collection-development-policies/digital-collections/ information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 4 ever-growing accumulation of data can quickly become complex. by visualizing data, we are able to interpret large and often messy sets of data while processing multiple aspects of the data concurrently. for example, the ohio state university (osu) libraries used tableau desktop to combine data from various departments in order to better manage and explore information.7 tableau was osu’s data visualization software of choice due to its ease of use and accessibility, and the program was also used to create dashboards that blend data from various sources for realtime visualizations. bibliographic metadata cleanup to understand the marriott library’s collections, one must first understand the relevant metadata, which for the rare books department is in the machine-readable cataloging (marc) format. a popular criticism of marc, commonly used in traditional library cataloging, is that the schema is highly regulated and, at times, redundant. however, for the purposes of this project, those qualities proved to be a boon. an older, uncorrected record in the digital library might list london as the place of publication for a particular book, but it was not immediately apparent if that referred to london, england; london, ontario; or london, ohio. however, a marc record would not only list a book’s city of publication in the 260 or 264 field but would also contain a two or three-letter code in the 008 field that specified the country, us state, canadian province or territory, or australian state or territory in which it was published. for this reason, the authors decided to base their analysis on marc record data from the physical collection instead of the dublin core metadata used in the digital library. in order to tease out the relationships between our digital and physical collections, each of the approximately 55,000 rare books bibliographic records stored in alma, the marriott library’s cloud-based library services platform, would have to have a common set of data points that could be compared. for the purposes of this analysis, the authors chose to investigate the place of publication and the subject of each work. despite the relative rigidity of marc metadata, some of the alma records lacked country of publication data in the 008 field. these records were not incorrect, but merely outdated: some had been copied directly from paper catalog card s when the library first transitioned to a computer-based cataloging system, while others were created using different metadata standards. approximately 6,000 rare books either completely lacked a country code in the 008 field or had data that could possibly be enhanced by, for example, replacing a code for the united states with a code for a particular state. instead of editing all 6,000 records by hand, the cataloger wrote several metadata normalization rules in alma to automatically correct the most obvious errors. records that listed chicago as the place of publication were assigned the marc geographic code for illinois, while those that were published in lugduni batavorum, the latin designation for leiden, were given the geographic code for the netherlands. however, 3,000 records were unable to be enhanced in this manner, either because their place of publication was an ambiguous city name like cambridge or because the place of publication was listed as unknown. the cataloger examined each record individ ually and was ultimately unable to assign a marc geographic code to 1,682 records, most of which were arabic manuscripts or advertising pamphlets that simply did not list a place of publication or creation. while these records would be excluded from the place of publication analysis, they could be mined for data on other topics. with the marc records as complete as possible, the metadata was exported from alma into an excel spreadsheet and given to the metadata librarian for further manipulation. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 5 metadata transformation & visualization creation the next phase involved standardizing the raw metadata to create human readable data, rather than marc codes, that are necessary to produce data visualizations. once the physical rare books ’ bibliographic metadata was updated in alma, it was then exported as a comma-separated values file. the raw data export produced a massive spreadsheet containing over 50,000 marc records. these records included twoand three-letter location codes for the place of publication from the library of congress marc code list for geographic areas. two-letter codes are used for most countries, while three-letter codes are used for states within the united states, provinces within canada, and territories within the united kingdom. while this additional level of location data was available for books from the united kingdom and canada, it was decided to review the collection at a country level for consistency and map display. books from the united states, however, were analyzed on a state level, considering the research is germane to an american institution. using a list correlating these codes to the location name provided by the library of congress (https://www.loc.gov/marc/countries/countries_code.html), a vlookup formula was used in microsoft excel to add the location names to the marc records. the vlookup formula pulls in data from one table to another as long as the two tables have one data field in common. in this exercise, both tables of data contained the library of congress location codes, therefore the lc location codes were used to add the location names to the table containing the marc metadata. once the location names were added, there were some additional quality control steps required, as lc location names that included outdated country names posed issues to mapping the data to current country names and boundaries. for example, we combined the codes for east germany and west berlin for the one representing contemporary germany. for countries that have since been dissolved and rezoned to multiple countries, e.g., the ussr and czechoslovakia, these records were manually checked for city names and then added to the current country. once this process was completed, the results showed the rare books were published in 97 countries and all 50 united states, as well as the district of columbia. examining the subject content of the rare books physical collection was another aspect of analysis for this project. in contemplating this analysis, using the lc subject heading field was considered, however, faceting of lc subject headings and the structure of the exported data posed too many issues for a rather simple analysis. instead, the library of congress call number was used to extract high-level lc classification information for each work by separating the first two letters of the call numbers included in the exported marc metadata, which indicated lc class and subclass. to add the lc class and subclass names to these letters, a vlookup formula was used again to match the letter codes to the list of lc classification categories. once classification categor ies were added to the 55,000 records, works from all 21 lc master classes and 190 subclasses were represented in the rare books collection. in addition to the physical rare books collection held at the marriott library, there is a selection of this collection that has been digitized and is accessible in the marriott digital library. the rare books digital collection (https://collections.lib.utah.edu/search?facet_setname_s=uum_rbc) comprises 780 works, although this number includes unique records for individual volumes within a series and therefore is not a true comparison to marc metadata records, which contain one record for a series. for example, the silver reef miner, a newspaper “devoted to the mining interests of southern utah” published during the late nineteenth century, has 30 individual volumes in the digital library, but these are represented in just one marc record. in order to compare the digital collection to the physical collection, the datasets would need to have https://www.loc.gov/marc/countries/countries_code.html https://collections.lib.utah.edu/search?facet_setname_s=uum_rbc information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 6 consistent data for comparison, namely place of publication and lc classification derived from call numbers. the digital collection metadata is in the dublin core schema, which does not include all of the metadata found in the marc metadata, nor does it use the same format. while there is a dublin core spatial element used to capture geographic data on what the item is about, this does not always align neatly with the location of an item’s publication. for example, reise in das innere nord-america in den jahren 1832 bis 1834 (2 volumes) is a book printed in germany that documents an expedition to north america in 1832–1834 and includes illustrations of native american people from the swiss artist karl bodmer. for these volumes, the appropriate dublin core spatial data would include the specific regions the expedition traveled to in north america; in the marc 26x field, however, it contains koblenz, germany, the city where the volumes were published. call number data was included for many digitized works, but not in a consistent format. in order to use the same data to compare the physical rare books collection to the digital one, the digital collection metadata was updated with the improved/accurate call numbers found in the marc metadata. another improvement to the digital collection metadata was the addition of the metadata management system (mms) id unique numerical identifiers that aid in locating a record in the alma system. when the rare books’ descriptive metadata was originally converted to dublin core during the digitization process, some titles and call numbers were changed and became different from their physical counterparts. the inclusion of the mms id allows for a consistent identifier between the physical and digital collections. when selecting data visualization software, being able to create a map of the places where books in the rare collection were published was a priority. considering the goal of creating an easily replicable workflow for other libraries, the authors sought a freely accessible program that did not require advanced geospatial skills, unlike esri’s arcgis software. tableau software is a data visualization software package with both a public and desktop version. the tableau desktop version requires a subscription fee while tableau public is open access. for the purposes of this study, tableau public offered open access and mapping features that are enabled without any geospatial knowledge necessary. analysis creating a variety of data visualizations allowed information about the rare books physical and digital collections to be more apparent than merely browsing entries in a spreadsheet. for example, there are numerous geographic disparities between the two collections of rare materials as shown in the american states in which works from the collections were published. while books from all 50 states are found in the physical collection (fig. 1), only 18 states are represented in the digital library (fig. 2), with new york being the state in which the highest number of books were published. as new york city has long been a major publishing center in the united states, the authors were not surprised by this. however, the subsequent states were quite different: california and utah ranked second and third for the physical collection, while massachusetts and pennsylvania claimed those spots for the digital library. the authors believe several factors might influence this discrepancy. first, works can only be added to the digital library if they are no longer in copyright, and states with longer histories of european-american settlement are more likely to have published books that are now out of copyright. furthermore, these older books are more likely to be in a fragile condition and therefore may have been digitized to decrease the amount of physical handling to which they are subjected. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 7 figure 1. marriott library physical rare books by us state. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 8 figure 2. marriott library digital rare books by us state. there are other discrepancies when comparing the country of publication between the physical (fig. 3) and digital collections (fig. 4). while 61% of the physical rare books were published in the united states, only 20% of the digitized works were published in this country. the authors expected to see egypt rank highly in the physical collection, as many of the rare books were purchased by former university of utah professor dr. aziz atiya to support the middle east center for research he founded; similarly high in rank, britain, germany, france, and italy were all major centers for the early printing and publishing trade in early modern europe. however, there is strong geographic bias in the digital collection, as only north america, western europe, and one african country are represented online. copyright may again play a factor, as the earliest books from non-western countries in the collection often date to the twentieth century, but a eurocentric or other bias cannot immediately be discounted. while the physical collection contains many more european imprints than from the global south, it is much more diverse than the digital collection. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 9 figure 3. marriott library physical rare books by country. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 10 figure 4. marriott library digital rare books by country. the analysis of the subjects represented in the collection proved to be somewhat challenging to study. due to the nature and structure of library of congress subject headings, which attempt to mirror natural language and may be composed of “strings” of phrases to represent complex topics, no tableau public visualization could be created that effectively grouped similar content areas together without looking quite fragmented. instead, the authors based their analysis of subjects on library of congress classification numbers (i.e., call numbers) assigned to works, which, though not exact, can be understood as distillations of the subject of a work.8 once again there were considerable differences between the physical and digital rare books collections (fig. 5). as in many generalized special collections, literature and history make up significant portions of the physical collection. however, works on bibliography, or the study of books and book history, comprise a notable percentage of the collection. many of these are modern works on book history and special collections librarianship and therefore are unable to be digitized due to copyright law. nearly 9% of the digital collection is on the sciences, though these works comprise only 3% of the physical collection. while this portion of the holdings may be relatively small, it contains many scientific high points such as vesalius’ de humani corporis fabrica, early printings of ancient mathematical texts, and the journals of major scientific societies, which may have been digitized both for physical preservation as well as high interest on the part of students and faculty on campus. information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 11 figure 5. percentage of rare books physical and digital collections by library of congress class. next steps now that the first phase of the project is complete, the authors would like to conduct additional analyses. first, they plan to compare the usage statistics of the digital rare books collection to the circulation statistics of the physical collection. this method of inquiry was not possible at the start of the project, as circulation information for the rare books was previously not tracked in the integrated library system. now that rare books are checked out to patrons for use in the special collections reading room, this data can be quickly pulled from alma. once there is a year’s worth of circulation data for the rare books unhindered by the changes necessitated by the coronavirus pandemic, the authors will compare the usage statistics of the digital collection for the same time period. do patrons in the reading room look at similar materials to online patrons, or are their interests vastly different? are some rare books used so frequently that they would benefit from the added physical security that digitization brings? information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 12 the authors also plan to pull annual usage statistics from the digitized rare books and share this with special collections division leadership. online patrons are still library patrons, and the division can use the viewing data to show the national and international reach of the collection. relatedly, the authors will investigate the digital library usage data in more depth. do patrons from utah, the united states, and the world look at similar materials, or are there geographic divides among the online patrons? do countries that are home to a majority of the university’s international student body have higher viewership numbers? finally, the authors wish to convene a group of stakeholders to create a formal collection development plan for the rare books component of the digital library. given the library’s limited resources, it is imperative that digitization be done thoughtfully and systematically. there is a good rationale for creating a digital collection that is representative of the physical rare books collection as well as one that highlights certain collection areas. both material fragility and the modern scholarly emphasis on highlighting the stories of people of color, women, and other underrepresented groups in library collections provide strong counterarguments to making digital libraries strictly representative of their physical counterparts. since informal conversations with patrons of the marriott library revealed that they assumed the digital library was representative of the collection overall, it is imperative that this assumption be either confirmed or disclaimed in a publicly viewable statement. in the case of the rare books department, the authors are in favor of a focused, rather than representative, collection development policy. firstly, many of the books in the collection are under copyright and therefore cannot be digitized, while other materials like reference sources for rare books librarians will be of limited interest to the general public. furthermore, complex items such as artists’ books are often poor candidates for digitization, as they may have movable components that cannot be captured accurately in a still photograph. as for what should be included online, the authors fully support equity, diversity, and inclusion efforts at the university of utah and would like to see the digital library highlight materials from marginalized groups whenever possible. usage statistics from the physical and digital collections, when they become available, should also inform the collection development policy to encourage traffic to the digital library. whatever is ultimately decided, however, the clarity a written policy provides will help streamline decision-making and ultimately help both library staff and patrons understand and search within the digital library much more effectively. endnotes 1 bart ooghe and dries moreels, “analysing selection for digitisation: current practices and common incentives,” d-lib magazine 15, no. 9/10 (2009), https://doi.org/10.1045/september2009-ooghe. 2 alexandra mills, “user impact on selection, digitization, and the development of digital special collections,” new review of academic librarianship 21, no. 2 (2015): 166. https://doi.org/10.1080/13614533.2015.1042117. 3 peter michel, “digitizing special collections: to boldly go where we’ve been before,” library hi tech 23, no. 3 (2005): 382, https://doi.org/10.1108/07378830510621793. https://doi.org/10.1045/september2009-ooghe https://doi.org/10.1080/13614533.2015.1042117 https://doi.org/10.1108/07378830510621793 information technology and libraries june 2022 rarely analyzed | mccormack and wittmann 13 4 bradley j. daigle, “the digital transformation of special collections,” journal of library administration 52, no. 3–4 (2012): 253, https://doi.org/10.1080/01930826.2012.684504. 5 niso framework working group, a framework of guidance for building good digital collections (2007), https://www.imls.gov/sites/default/files/publications/documents/framework3.pdf. 6 the urls in the following list were accurate as of march 2, 2022. 7 sarah anne murphy, “data visualization and rapid analytics: applying tableau desktop to support library decision-making,” journal of web librarianship 7, no. 4 (2013): 465–76, https://doi.org/10.1080/19322909.2013.825148. 8 readers who do not work with marc metadata may not be familiar with how library of congress call numbers are assigned. created in 1891, the classification system is based on 21 classes designated by a single letter; subclasses add one or two letters to the initial class. catalogers must choose which one of the classes to assign to a particular work. the subject headings may guide a cataloger towards a certain class, but there is not a 1:1 relationship between subject headings and call number classes. https://doi.org/10.1080/01930826.2012.684504 https://www.imls.gov/sites/default/files/publications/documents/framework3.pdf https://doi.org/10.1080/19322909.2013.825148 abstract introduction literature review bibliographic metadata cleanup metadata transformation & visualization creation analysis next steps endnotes contactless services: a survey of the practices of large public libraries in china article contactless services a survey of the practices of large public libraries in china yajun guo, zinan yang, yiming yuan, huifang ma, and yan quan liu information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14141 yajun guo (yadon0619@hotmail.com) is professor, school of information management, zhengzhou university of aeronautics. zinan yang (yangzinan612@163.com) is master, school of information management, zhengzhou university of aeronautics. yiming yuan (yuanyiming361@163.com) is master, school of information management, zhengzhou university of aeronautics. huifang ma (mahuifang126@126.com) is master, school of information management, zhengzhou university of aeronautics. *corresponding author hamed yan quan liu (liuy1@southernct.edu) is professor, department of information and library science, southern connecticut state university. © 2022. abstract contactless services have become a common way for public libraries to provide services. as a result, the strategy used by public libraries in china will effectively stop the spread of epidemics caused by human touch and will serve as a model for other libraries throughout the world. the primary goal of this study is to gain a deeper understanding of the contactless service measures provided by large chinese public libraries for users in the pandemic era, as well as the challenges and countermeasures for providing such services. the data for this study was obtained using a combination of website investigation, content analysis, and telephone interviews for an analytical survey study of 128 large public libraries in china. the study finds that touch-free information dissemination, remote resources use, no-touch interaction self-services, network services, online reference, and smart services without personal interactions are among the contactless services available in chinese public libraries. exploring the current state of contactless services in large public libraries in china will help to fill a need for empirical attention to contactless services in libraries and the public sector. up-to-date information to assist libraries all over the world in improving their contactless services implementation and practices is provided. introduction the spread of covid-19 began in 2020, and people all over the world are still fighting the severity of its spread, the breadth of its impact, and the extent of its endurance. the virus’s continued spread has had a wide-ranging impact on industry sectors worldwide, including libraries. the growth of public libraries has also seen significant changes as a result of covid-19, resulting in added patron services, including contactless services. contactless services are those that patrons can use without having to interact face to face with librarians. these services transcend time and geographical constraints, as well as lower the danger of disease transmission through human interaction. since the covid-19 pandemic, contactless or touch-free interaction services are emerging in chinese public libraries. this service model can also serve as a reference for other libraries. this study evaluates and analyzes contactless service patterns in large public libraries in china, and then suggests a contactless service framework for public libraries, which is currently in the process of being implemented. mailto:yadon0619@hotmail.com mailto:yangzinan612@163.com mailto:mahuifang126@126.com mailto:liuy1@southernct.edu information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 3 literature review the available literature shows that the term “non-contact” appeared as early as 1916 in the article “identification of the meningococcus in the naso-pharynx with special reference to serological reactions” and described a patient’s infection in the context of medical research.1 in recent years, with the widespread application of “internet +” and the development and promotion of technologies such as the internet of things, cloud computing, and artificial intelligence, the contactless economy has grown by leaps and bounds, and so has the research on library contactless services.2 library contactless services encompass a wide range of services such as selfservices, online reference, and smart services without personal interactions. library self-service has become a major service model for contact-free services. the self-service model was first adopted in american public libraries in the 1970s with the emergence of self service borrowing and returning practices.3 many public libraries have since adopted stand-alone, fully automated self-service halls, self-service counters, etc.4 by the 1990s, a range of commercial self-service kiosks and self-service products had been introduced.5 currently, the most mature self-service type used by the library community is the circulation self-service product.6 in addition to self-service borrowing and returning of titles, libraries have launched self-service printing systems, self-service computer systems, and self-service booking of study spaces.7 as an example, patrons can complete printing operations using a self-service system and can offer payment by bank card, alipay, wechat, and other means.8 a face recognition system can also be used to borrow and return books, a solution for patrons who forget their library cards.9 these library selfservice system elements are confined to simple, repetitive, and routine tasks such as conducting book inventories, book handling, circulating books, and the like, whose development stems from the widespread application of electronic magnetic stripe technology and radio frequency identification (rfid), optical character recognition (ocr) technology, and face recognition.10 new applications of technology continue to advance the development of contactless services in libraries. the overall work and service processes of the library have been made intelligent to varying degrees. online reference is an important service in the contactless service program. researchers have started to study the current state of library reference services. interactive online reference services support patrons using the library, including how to search for literature, locate and renew books, schedule a study or seminar room, and participate in other library activities, such as seminars, lectures, etc.11 in response to the problem of how patrons access various library service abilities, digital reference systems need to have functions such as automated semantic processing, automated scene awareness, through automatic calculation and adaptive matching, understanding of patrons’ interests preferences and needs, and the ability to recommend the most suitable information resources for them.12 at present, most library reference services in china mainly include the use of telephone, email, wechat, robot librarians/interactive communication, microblogs, and qq, an instant messaging software popular in china. during the past two years, most public libraries in china have essentially implemented the use of the aforementioned reference tools to communicate and interact with patrons, with wechat having a 55.6% adoption rate when compared to other instant reference tools.13 the use of online chat in reference services has allowed librarians to help patrons from anywhere and at any time through embedding chat plug-ins into multiple pages of the library website and directing patrons to ask questions based on the specific page they are viewing, setting up automatic pop-up chat windows, and changing patrons’ passive waiting to active engagement. 14 in terms of technology, emerging technologies information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 4 such as patron profiling, natural language processing, and contextual awareness can support the development of reference advisory services in libraries.15 the online reference service provides a 24/7, high-quality, efficient, and personalized service that connects libraries more closely with society and is an important window in the future smart library service system. smart services without personal interactions may become the most popular form of library services development for the future, and research on library smart services has gradually deepened. in terms of conceptual definition, the library community generally understands the concept of library smart services as mobile library services that are not limited by time and space and can help patrons find books and other types of materials in the library by connecting to the wireless internet.16 apart from this, there are two other ways to define library smart services. one discusses the meaning of smart services in an abstract way, such as library smart services that should be an advanced library form dedicated to knowledge services through human-computer interaction, a comprehensive ecosystem.17 the other concretizes the extension of this concept expressed with a formula “smart library = library + internet of things + cloud computing + smar t devices.”18 applied technology research is an important part of smart services in libraries. library smart services have three main features: digitization, networking, and clustering. among them, digitization provides the technical basis, networking provides the information guarantee, and clustering provides the library management model of resources sharing, complementary advantages, and common development among libraries.19 the key breakthrough in the development of smart services is the applications deployment of smart technologies to truly realize a new form of integration of online and offline, virtual and reality. 20 the integration of face recognition technology in traditional libraries, as well as its application to services like acces s control management, book borrowing and returning, and wallet payment, can help libraries build smart services faster.21 the integration of deep learning into a mobile visual search system for library smart services can play an important role in integrating multiple sources of heterogeneous visual data and the personalized preferences of patrons.22 blockchain technology, born out of the impact of the new wave of information technology, has also been applied to the construction of smart library information systems because of its decentralized and secure features.23 library smart services can leverage new technologies and smart devices to enhance the efficiency of library contact-free services and provide new opportunities for knowledge innovation, knowledge sharing, and universal participation, thereby enabling innovation in service models. additional research on the development of contactless services in service areas such as library self-services, online reference, and smart services is discussed. in particular, the research and construction of smart library services have been enriched with the advent of big data and artificial intelligence. however, non-contact service has not been systematically researched and elaborated in domestic and international librarianship. the emergence and prevalence of covid-19 has enabled libraries in many countries to practice various types of touch-free services, such as the introduction of postal delivery, storage deposit, and click-and-collect in australian libraries; curbside pickup service or build a book bag service in us public libraries; and delivery book to the building services in chinese university libraries. 24 therefore, a systematic investigation and study of contactless services in public libraries in the pandemic is of great importance for the adaptation and innovation of library services. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 5 methods survey samples the survey selected some of the most typical public libraries for the study. the selection criteria were those large public libraries in the more economically and culturally developed regions of china. a total of 128 large public libraries were identified, including national libraries, 32 provincial public libraries, and municipal public libraries in the top 100 cities by gdp ranking in 2020, of which five public libraries, including the capital library and nanjing library, are both top 100 city libraries and provincial libraries. these 128 large public libraries can more obviously reflect the current service level of the better developed public libraries in china, and represent the highest level of public library construction in china. (see table 1 for a list of the libraries studied.) table 1. a list of the 128 public libraries that were studied no. library no. library 1. national library of china 2. hebei library 3. shanxi library 4. liaoning provincial library 5. jilin province library 6. heilongjiang provincial library 7. zhejiang library 8. anhui provincial library 9. fujian provincial library 10. jiangxi provincial library 11. shandong library 12. henan provincial library 13. hubei provincial library 14. hunan library 15. guangzhou library 16. hainan library 17. sichuan library 18. guizhou library 19. yunnan provincial library 20. shanxi library 21. gansu provincial library 22. qinghai library 23. guangxi library 24. inner mongolia library 25. tibet library 26. ningxia library 27. xinjiang library 28. shanghai library 29. capital library of china 30. shenzhen library 31. guangzhou digital library 32. chongqing library 33. tianjin library 34. suzhou library 35. chengdu public library 36. wuhan library 37. hangzhou public library 38. nanjing library 39. qingdao library 40. wuxi library 41. changsha library 42. ningbo library 43. foshan library 44. zhengzhou library 45. nantong library 46. dongguan library 47. yantai library 48. quanzhou library 49. dalian library 50. jinan library 51. xi’an public library 52. hefei city library 53. fuzhou library 54. tangshan library 55. changzhou library 56. changchun library 57. guilin library 58. harbin library 59. xuzhou library 60. shijiazhuang library 61. weifang library 62. shenyang library 63. wenzhou library 64. shaoxing library information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 6 no. library no. library 65. yangzhou library 66. yancheng library 67. nanchang library 68. zibo library 69. kunming library 70. taizhou library 71. erdos city library 72. public library of jining 73. taizhou library 74. linyi library 75. luoyang library 76. xiamen library 77. dongying library 78. nanning library 79. zhenjiang library 80. jiaxing library 81. xiangyang library 82. jinhua library 83. yichang library 84. huizhou tsz wan library 85. cangzhou digital library 86. zhangzhou library 87. weihai library 88. digital library of handan 89. guiyang library 90. sun yat-sen library of guangdong province 91. ganzhou library 92. baotou library 93. huaian library 94. yulin digital library 95. dezhou network library 96. yuyang library 97. changde library 98. baoding library 99. the library of jiujiang city 100. taiyuan library 101. hohhot library 102. wuhu library 103. langfang library 104. national library of hengyang city 105. maoming library 106. nanyang library 107. heze library 108. urumqi library 109. zhanjiang library 110. zunyi library 111. shangqiu library 112. jiangmen library 113. liuzhou library 114. zhuzhou library 115. xuchang library 116. chuzhou library 117. lianyungang library 118. suqian library 119. mianyang library 120. zhuhai library 121. xinyang library 122. zhoukou library 123. zhumadian library 124. huzhou library 125. lanzhou library 126. fuyang library 127. xinxiang library 128. jiaozuo library survey methods web-based investigation, content analysis, and interviews with librarians were used to assess 128 public libraries in china. the survey was carried out between march 10 and september 15 in 2021. first, the authors identified the media platforms for sharing information about each public library’s contactless services, including an official website, a social networking account on wechat, or a library-developed app. the authors investigated whether these media platforms were updated with information about the contactless services and if they provided various information about these services. next, the authors searched the various contactless services offered by this library through these media platforms and recorded them. finally, the authors reviewed the data and findings from the survey to minimize errors and ensure the accuracy of the findings. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 7 findings touch-free information distribution the distribution of library information is generally carried out in a touch-free manner. there are three commonly used information media in libraries: official website, wechat official account, and library-developed app. the adoption rate of each information medium by libraries is determined by investigating whether libraries have opened information media platforms and whether the opened platforms are updated with service information. the results showed that the information medium with the highest adoption rate was the wechat official account, reaching 100%. the library’s official website showed an adoption rate of 94%. only 57% of libraries use apps to distribute contactless information (see fig, 1). figure 1. percentage of touch-free information distribution platforms in large public libraries in china. patron services must provide timely and convenient access if public libraries want to effectively expand their patron base or increase library usage. wechat is better adapted to user convenience than websites, which explains the greater utilization rate as a contactless information dissemination tool for libraries. as a public service institution, the chinese public library has an incomparable impact on politics, economy, and culture. libraries have a great influence on the cultural popularization and educational development of the public. therefore, touch-free information dissemination plays an important role in improving the efficiency of information dissemination. wechat has been fully integrated into china’s public library services as a communication tool, allowing libraries to better foster cultural growth. in the process of cultural growth, libraries need to emphasize interactive public participation and combine public culture, social topics, citizen interaction and media communication, bringing innovative value to promote urban vitality and urban humanism. the 100% 94% 57% 0% 20% 40% 60% 80% 100% 120% wechat official account official website app information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 8 widespread use of wechat helps users stay up to date on the newest information and access library resources services more conveniently. remote resources services restrictions on the use of digital resources are closely related to the frequency of patrons’ use. restrictive measures that posed obstacles to patrons using digital resources were identified. among the 128 large public libraries surveyed, 42% of libraries require reader card authentication by patrons before they can access remote resources services; 8% of libraries do not require users to have reader cards for services. patrons can use the remote resources services available in the remaining 49% of public libraries without needing to register for a user account or patron id on the library website. to reduce the risk of infection between librarians and patrons, some libraries adopted noncontact paper document delivery services for users in urgent need of paper books during the pandemic. for example, the peking university library’s book delivery to building service (see fig. 2) and xiamen library and wenzhou library’s book delivery to home (see fig. 3) allow patrons to reserve books online, and librarians will express mail the books to patrons’ homes according to their needs. figure 2. peking university library’s book delivery service to the building. figure 3. book delivery service of xiamen library and wenzhou library. contactless services have two outstanding advantages: services can be obtained without contact with people, and convenience. however, if the use of remote resources is restricted in many ways, it will lead to a decrease in the utilization of digital resources in libraries. while intellectual property requirements and concerns must be appropriately managed, public libraries should strive to provide patrons with unlimited access to digital materials and physical print books. no-touch interaction self-services no-touch interaction self-services in chinese public libraries mainly include self-checkout, selfretrieval, self-storage, self-printing, self-card registration, and other self-service services, such as self-payment, and self-reservation of study rooms or seminar rooms (see fig. 4). information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 9 figure 4. percentage of large public libraries in china that provide contactless self-service. the survey of large public libraries in china shows that the majority offer self-checkout and selfretrieval services. the percentage of public libraries offering self-storage, self-certification and self-printing is low, with only 50% or less usage. self-storage, as one of the earlier self-services, has a usage rate of 50%. only 34 percent of public libraries offered self-card registration. the selfservice card registration machine has four main functions: reader card registration, payment, password modification, and renewal. for example, when patrons need to pay deposits or overdue fines, they can use the self-service card registration machine to swipe their cards and payment to facilitate subsequent borrowing of various resources. the machine supports face recognition technology for card application and online deposit recharge, catering to the needs of patrons in many aspects of operation (see fig. 5). the proportion of self-printing is even lower available at only 15% of libraries. self-card registration and self-printing are both emerging self-service options that require strong financial and technical support and are therefore not widely available. 5% 99% 98% 50% 34% 15% 0% 20% 40% 60% 80% 100% 120% others self-checkout self-retrieval self-storage self-card registration self-printing information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 10 figure 5. self-service card registration machine in chinese large public libraries. most public libraries in china have set up dedicated self-service libraries or microservice halls on the wechat public account platform in addition to further promoting library contactless services and enabling users to enjoy self-service library services anytime, anywhere. for example, the changsha library (see fig. 6) and the taiyuan library (see fig. 7) have both set up a microservice hall column on their wechat public numbers, containing services such as personal appointment, book renewal, event registration, and digital resources. the emergence of online self -service library services has greatly contributed to the development of equalization and standardization of public library services. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 11 figure 6. changsha library no-touch interaction self-service hall. figure 7. taiyuan library no-touch interaction self-service hall. 24-hour self-service library the 24-hour self-service library, a contactless phenomenon in china’s public libraries, was introduced in 2006 and officially launched in 2007 by dongguan library and followed by shenzhen library’s initial batch of ten self-service libraries. the success of the shenzhen model has sparked a boom in the construction of self-service libraries in china, with 77% of the chinese public libraries surveyed having opened self-help libraries. the development of self-service libraries is divided into two types of service models: space-based self-service libraries (see fig. 8), i.e., unattended libraries with a certain amount of space for use, in which patrons can freely select books and read for leisure, such as 24-hour city bookstores; and a cabinet-type self-service library (see fig. 9), similar to a bank atm with an operating panel and similar in appearance to a bookcase, which allows real-time data interaction with the central library via the network. the eight self-service libraries in taiyuan library in shanxi can provide self-service book borrowing services through the new model of library + internet + credit, which allows patrons to apply for a reader’s card without a deposit and make reservations online and deliver books to the counter (see fig. 10). by cross-referencing the reader’s card with the patron’s face information, the guangzhou self-service library provides self-service borrowing and returning services for patrons through face recognition. there are many similar self-service libraries in china, which provide various types of patron services in different forms, largely reducing direct contact between patrons and librarians, and between patrons and readers. for example, when the pandemic was most severe, data collected from the ningbo self-service library showed that 7,022 physical books were borrowed and returned from january to march 2020, 50% more than in a normal year.25 information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 12 figure 8. space-based self-service libraries. figure 9. cabinet type self-service library. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 13 figure 10. taiyuan self-service library. the popularity of 24-hour self-service libraries in china is first and foremost due to the strong support and financial investment of government departments in the construction of self -service libraries. secondly, the features of self-service libraries, which are convenient, time-independent, time-saving, efficient, and diversified, are in line with modern lifestyles, integrating public library services into people’s lives, increasing the visibility and penetration of public library patron services, and maximizing patrons’ needs in reading. network services there is a wide range of network services but the most common are seat reservation, online renewal, and overdue fee payment (see fig. 11). the survey found that 89% of chinese public libraries offer at least one of these network services, indicating a high adoption rate of network services. in 2002, online renewals began to appear in china and then gradually became popular. most of the public libraries in china provide this service in the personal library or wechat official account. the rate of adoption of network service is as high as 85% in the 128 public libraries surveyed. the prevalence of seat reservation services is not high. only 28% of the public libraries surveyed offered seat reservation services. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 14 figure 11. percentage of large chinese public libraries that provide network services. coverage of the online overdue fee payment service was even lower with only 21% of public libraries providing access. however, some libraries have replaced the overdue fee system with other methods, such as the shantou library’s lending points system. in the system, the initial number of points on a patron’s account is 100, with two points added for each book borrowed and one point deducted for each day a book is overdue. when the number of points deducted on the account reaches zero, the reader’s card will be frozen for seven days and cannot be used to borrow books. after the freeze is lifted, the number of points will be reset to 20.26 in summary, contactless services in china’s public libraries are moving in a more humane direction. online reference services as a type of contactless service, online reference services are extremely helpful in developing access to documentary information resources. the survey shows that 94% of public libraries provide online reference services. online reference services are available by telephone, website, email, qq, and wechat. telephone reference and website reference are the earliest forms of contactless service, with the highest usage rates of 79% and 71% respectively among public libraries surveyed. this is followed by slightly lower coverage of email reference and qq reference at 55% and 48% respectively. wechat reference coverage rate is the lowest with only 16% (see fig. 12). qq and wechat are both tencent’s instant messengers, but qq’s file function is slightly stronger than wechat’s. qq can send large files of over 1gb and files do not expire, making it easy for the reference librarians to communicate with patrons. 85% 28% 21% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% online renewal seat reservation overdue fee payment information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 15 figure 12. percentage of large public libraries in china that provided online reference service tools. other online reference methods such as microblog reference and intelligent robot reference are present in chinese large public libraries. real-time reference is labor-intensive and timeconsuming, and where librarians may be unavailable to provide an immediate response, intelligent robotic referencing can make up for the problem of consultants being online full time. applying intelligent robots to library reference can also provide accurate and personalized consultation services according to patrons’ needs and behavioral patterns, greatly improving the quality, effectiveness, and satisfaction of consultation services. for example, the zhejiang library has an online reference service which includes online 24-hour robot reference and offline message modules. patrons can also choose expert reference and see available reference experts in the expert list and their details, including name, library, title, specialties, status, etc.27 in addition, the hunan library provides joint online reference, which is a public welfare platform of the hunan provincial literature and information resources common construction and sharing collaborative network, to provide online reference services to the public. eleven member units, including hunan library, hunan university library, and hunan science and technology information institute benefit from the rich literature resources, information technology, and human resources of the network, and all sites work together to provide free online reference advice and remote delivery of literature to a wide range of patrons, as well as advisory and tutorial services to guide patrons on how to use the library’s physical and digital resources.28 smart services without personal interactions driven by artificial intelligence, blockchain, cloud computing, and other technologies, libraries are evolving from physical and digital libraries to smart libraries. smart services without personal interactions are a fundamental capability of smart libraries. this survey found that the coverage of 4% 79% 71% 55% 48% 16% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% others telephone website email qq wechat information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 16 smart services was 52%, with virtual reality coverage at 21%, face recognition coverage at 20%, and swipe face to borrow books at 9%. face recognition can be used in library resources services, face gates, security monitoring, self-checkout, and other online and offline real-name identity verification instances, which can improve the efficiency of identity verification. the biggest advantage of face recognition is that it is contactless and easy to use, avoiding the health and safety risks associated with contact identification such as fingerprints. swipe face to borrow books is one of the applications included in face recognition technology that allows patrons to quickly borrow and return books by swiping faces, even if they have forgotten their reader’s card. this technology also tracks the interests of patrons based on their borrowing habits and history records, providing them with corresponding reading recommendation services. it is worth noting that chinese public libraries have a rich variety of smart service methods. in terms of vr technology applications, the national library of china launched the national library virtual reality system in 2008, the first service in china to bring vr technology to the public eye. the virtual reality system provides patrons with the option to explore virtual scenes and interact with virtual resources available in the library. the virtual scenes are distributed by using computer systems to build realistic architectural structures and reading rooms, so that patrons can learn about the library in the library lobby with the help of vr equipment. virtual resources are digital resources presented in virtual form. the technology combines flash and human gesture recognition systems, allowing patrons to flip through books touch-free at virtual reality reading stations, enhancing the reading style and interactive experience. in addition, the fuzhou library is concerned with the characteristics of different groups of people and has made virtual experiences a focus of its services, using vr technology to innovate reading methods, such as presenting animal images in 3d form on a computer screen, which has been welcomed by a large number of readers, especially children. shanghai library, tianjin library, shenzhen library, chongqing library, and jinan library have introduced vr technology into their patron services as to attract more users. in terms of blockchain applications, the national digital library of china makes use of the special features of blockchain technology in terms of distributed storage, traceable transmission, and high-grade encryption to provide full-time, full-domain, and full-scene copyright protection for massive digital resources and promotes the construction of intelligent library services. related to big data technology, the shanghai library provides personalized recommendation services for e-books based on the characteristics of the books borrowed by readers. patrons using a mobile phone can scan a code on borrowed books and click on the recommended book’s cover for immediate reading.29 conclusion & recommendations an in-depth analysis of the contactless service strategy will help to steadily improve the smart library development process in public libraries and to support their transition to smart libraries. this report provides a systematic framework for contactless services for public libraries based on a survey and assessment of the contactless service status of large public libraries in china. contactless patron services, contactless space services, contactless self -services, and contactless extension services are the four key components of the framework (see fig. 13). information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 17 figure 13. a systematic framework of contactless services for public libraries. providing contactless patron services patron services are the heart and soul of each public library. the library’s services providing no personal physical contact or touch-free connection with patrons are referred to as contactless patron services. this includes book lending, online reference, digital resources and network reading promotion. at present, most chinese public libraries have few contactless lending options, making it difficult to meet the needs of patrons who cannot access the library due to covid-19 or transportation difficulties for various reasons. therefore, public libraries can enrich their existing book lending methods by providing patrons with contactless services, such as book delivery and online lending, to create a convenient reading environment. a focus on digital resources is fundamental to achieving contactless patron services. at present, some public libraries in china neglect the management of digital resources due to the emphasis on paper resources, and digital resources are not updated and maintained in a timely manner, which leads to the inability of patrons to use them smoothly; therefore, the effective management of digital resources in libraries is crucial. in addition, public libraries can carry out activities such as network reading promotion and reader education to effectively improve the utilization of library resources. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 18 building contactless space services contactless space services refer to the touch-free interaction between physical space and virtual space. physical space services mainly include self-reservation of study rooms, discussion rooms, meeting rooms, as well as providing venues for public lectures or exhibitions, etc., to fulfill the space demands arising from patrons’ access to information. virtual space services mainly include building spaces for collaboration and communication, creative spaces, information sharing spaces, and cultural spaces, providing a virtual integrated environment for patrons’ needs for information exchange and acquisition in the online environment. public libraries can develop their activities through different channels according to the characteristics and elements of physical and virtual spaces, so that libraries can evolve from “library as a place” to “library as a platform.” the combination of an offline library space and an online library platform provides a more convenient and accessible library experience for patrons. implementing no-touch interaction self-services no-touch interactive self-service plays a pivotal role as one of the service forms of the contactless service strategy. it mainly includes no-touch interaction self-services such as information retrieval, resources navigation, self-checkout, and self-printing. public libraries can set up no-touch interaction self-service sections on their official websites or social media accounts to help patrons quickly access up-to-date information from anywhere and at any time. developing contactless extension services in the three dimensions of time, space, and approach, contactless extension services refer to the mutual extension of the library. public libraries can be open year round on a 24/7 basis or during holidays without librarians, allowing patrons to swipe their own cards to gain access. the traditional collection of paper books should not only be available in offline libraries but can extend to individual self-service libraries or city bookshops. libraries can approach patrons with a more individualized service strategy. for example, some public libraries provide a service called build a book bag, where librarians select books according to the patron’s personal interests and reading preferences and deliver them to a designated location. limitations and prospects after analyzing the current status of contactless services in large public libraries in china, this paper finds that contactless services such as reference and access to digital resources are well established in chinese public libraries. on the other hand, the availability of contactless applications such as no-touch interaction self-services, network services, and smart services without personal interaction are less well-developed. despite the rapid development of touch-free services and their variety, public libraries in china have not yet implemented a system of contactless services. this paper proposes a systematic framework to improve the development and practice of contactless services in public libraries and interrupt the spread of covid-19. the framework includes four core modules: contactless patron services, contactless space services, contactless self-help services, and contactless extension services. it is foreseeable that contactless services will become the mainstream of public library services in the future. information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 19 endnotes 1 fred griffith, “identification of the meningococcus in the naso-pharynx with special reference to serological reactions,” journal of hygiene 15, no. 3 (1916): 446–63, https://doi.org/10.1017/s0022172400006355. 2 “guiding opinions of the state council on actively promoting the ‘internet +’ action,” 2015, http://www.gov.cn/zhengce/content/2015-07/04/content_10002.htm. 3 d. brooks, “a program for self-service patron interaction with an online circulation file,” in proceedings of the american society for information science 39th annual meeting (oxford, england, 1976). 4 beth dempsey, “do-it-yourself libraries,” library journal 135, no. 12 (2010): 86–93, https://doi.org/10.1016/j.lisr.2010.03.004. 5 jackie mardikian, “self-service charge systems: current technological applications and their implications for the future library,” reference services review 23, no. 4 (1995): 19–38, https://doi.org/10.1108/eb049262. 6 pan yongming, liu huihui, and liu yanquan, “mobile circulation self-service in u.s. university libraries,” library and information service 58, no. 12 (2014): 26–31, https://doi.org/10.13266/j.issn.0252-3116.2014.12.004. 7 chen wu and jang airong, “building a modern self-service oriented library,” journal of academic libraries, no. 3 (2013): 93–96, https://doi.org/cnki:sun:mrfs.0.2016-24-350. 8 rao zengyang, “innovative strategies for university library services in the era of smart libraries,” library theory and practice, no. 12 (2016): 75–76, https://doi.org/10.14064/j.cnki.issn1005-8214.2016.12.018. 9 wang weiqiu and liu chunli, “functional design and model construction of intelligent library services in china based on face recognition technology,” research on library science, no. 18 (2018): 44–50, https://doi.org/10.15941/j.cnki.issn1001-0424.2018.18.008. 10 cheng huanwen and zhong yuanxin, “a three-dimensional analysis of a smart library,” library tribune 41, no. 6 (2021): 43–45. 11 nahyun kwon and vicki l. gregory, “the effects of librarians’ behavioral performance and user satisfaction in chat reference services,” reference & user services quarterly, no. 47 (2007): 137–48, https://doi.org/10.5860/rusq.47n2.137. 12 w. uutoni, “providing digital reference services: a namibian case study,” new library world 119, no. 5 (2018): 342–56, https://doi.org/10.1108/ils-11-2017-0122. 13 zhu hui, liu hongbin, and zhang li, “an analysis of the remote service model of university libraries in response to public safety emergencies,” new century library, no. 5 (2021): 39–45, https://doi.org/10.16810/j.cnki.1672-514x.2021.05.007. https://doi.org/10.1017/s0022172400006355 http://www.gov.cn/zhengce/content/2015-07/04/content_10002.htm https://doi.org/10.1016/j.lisr.2010.03.004 https://doi.org/10.1108/eb049262. https://doi.org/10.13266/j.issn.0252-3116.2014.12.004 https://doi.org/10.14064/j.cnki.issn1005-8214.2016.12.018 https://doi.org/10.15941/j.cnki.issn1001-0424.2018.18.008 https://doi.org/10.5860/rusq.47n2.137 https://doi.org/10.1108/ils-11-2017-0122 https://doi.org/10.1080/24750158.2020.1840719 information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 20 14 xiangming mu, alexandra dimitroff, jeanette jordan, and natalie burclaff, “a survey and empirical study of virtual reference service in academic libraries,” journal of academic librarianship 37, no. 2 (2011): 120–29, https://doi.org/10.1016/j.acalib.2011.02.003. 15 cheng xiufeng et al., “a study on a library’s intelligent reference service model based on user portraits,” research on library science, no. 2 (2021): 43–55, https://doi.org/10.15941/j.cnki.is sn1001-0424.2021.02.012. 16 m. aittola, t. ryhänen, and t. ojala, “smart library-location-aware mobile library service,” in human-computer interaction with mobile devices and services, international symposium, (2003). 17 chu jingli and duan meizhen, “from smart libraries to intelligent libraries,” journal of the national library of china, no. 1 (2019): 3–9, https://doi.org/10.13666/j.cnki.jnlc.2019.01.001. 18 yan dong, “iot-based smart libraries,” journal of library science 32, no. 7 (2010): 8–10, http://doi.org/10.14037/j.cnki.tsgxk.2010.07.034. 19 wang shiwei, “a brief discussion of the five relationships of smart libraries,” library journal 36, no. 4 (2017): 4–10, https://doi.org/10.13663/j.cnki.lj.2017.04.001. 20 morell d. boone, “unlv and beyond,” library hi tech 20, no. 1 (2002): 121–23, https://doi.org/10.1108/07378830210733981. 21 qin hong et al., “research on the application of face recognition technology in libraries,” journal of academic libraries 36, no. 6 (2018): 49–54, https://doi.org/10.16603/j.issn10021027.2018.06.008. 22 li mo, “research on a mobile visual search service model for smart libraries based on deep learning,” journal of modern information 39, no. 5 (2019): 89–96. 23 zhou jie, “study on the application of lora technology in smart libraries,” new century library, no. 5 (2021): 57–61, https://doi.org/10.16810/j.cnki.1672-514x.2021.05.010. 24 international federation of library associations and institutions, “the covid-19 and the global library community,” 2020, https://www.ifla.org/covid-19-and-the-global-library-field/; guo yajun, yang zinan, and yang zhishun, “the provision of patron services in chinese academic libraries responding to the covid-19 pandemic,” library hi tech 39, no. 2 (2021): 533–48, https://doi.org/10.1108/lht-04-2020-0098; peking university library, “book delivery service to the buildings where the patrons live,” (2020), https://mp.weixin.qq.com/s/eknyg_-_rjrcl6sjc-it-a. 25 hu bin ying yan, “study on the intelligent construction of ningbo library under the influence of epidemic,” jiangsu science & technology information 38, no. 24 (2021): 17–21, https://doi.org/10.3969/j.issn.1004-7530.2021.24.005. 26 shantou library, “come and be a book ‘saint’! city library changes lending rules, points system instead of overdue fees,” 2021, http://www.stlib.net/information/26182. https://doi.org/10.1016/j.acalib.2011.02.003 https://doi.org/10.13666/j.cnki.jnlc.2019.01.001 https://doi.org/10.13663/j.cnki.lj.2017.04.001 https://doi.org/10.1108/07378830210733981 https://doi.org/10.16603/j.issn1002-1027.2018.06.008 https://doi.org/10.16603/j.issn1002-1027.2018.06.008 https://doi.org/10.16810/j.cnki.1672-514x.2021.05.010 https://www.ifla.org/covid-19-and-the-global-library-field/ https://doi.org/10.1108/lht-04-2020-0098 https://mp.weixin.qq.com/s/eknyg_-_rjrcl6sjc-it-a' http://dx.chinadoi.cn/10.3969/j.issn.1004-7530.2021.24.005 http://www.stlib.net/information/26182 information technology and libraries june 2022 contactless services | guo, yang, yuan, ma, and liu 21 27 zhejiang library, “online reference services,” 2020, https://www.zjlib.cn/yibanwt/index.htm?liid=2. 28 hunan provincial collaborative network for the construction and sharing of literature and information resources, “reference union of public libraries in hunan province,” 2021, http://zx.library.hn.cn/. 29 ministry of culture and tourism of the people’s republic of china, “shanghai library launches personalized recommendation service for e-books,” 2021, https://www.mct.gov.cn/whzx/qg whxxlb/sh/202101/t20210106_920497.htm. https://www.zjlib.cn/yibanwt/index.htm?liid=2 http://zx.library.hn.cn/ https://www.mct.gov.cn/whzx/qgwhxxlb/sh/202101/t20210106_920497.htm https://www.mct.gov.cn/whzx/qgwhxxlb/sh/202101/t20210106_920497.htm abstract introduction literature review methods survey samples survey methods findings touch-free information distribution remote resources services no-touch interaction self-services 24-hour self-service library network services online reference services smart services without personal interactions conclusion & recommendations providing contactless patron services building contactless space services implementing no-touch interaction self-services developing contactless extension services limitations and prospects endnotes ontology for the user-learner profile personalizes the search analysis of online learning resources: the case of thematic digital universities article ontology for the user-learner profile personalizes the search analysis of online learning resources the case of thematic digital universities marilou kordahi information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.13601 marilou kordahi (marilou_kordahi@yahoo.fr) is assistant professor, faculty of business administration and management, saint-joseph university of beirut, and associate researcher, paragraph research laboratory, paris 8 university. © 2022. abstract we hope to contribute to the field of research in information technology and digital libraries by analyzing the connections between thematic digital universities and digital user-learner profiles. thematic digital universities are similar to digital libraries, and focus on creating and indexing open educational resources, as well as improving learning in the information age. the digital user profile relates to the digital representation of a person’s identity and characteristics. in this paper we present the design of an ontology for the digital user-learner profile (ontoulp) and its application program. ontoulp is used to structure a user-learner’s digital profile. the application provides each user-learner with tailor-made analyses based on informational behaviors, needs, and preferences. we rely on an exploratory research approach and on methods of ontologies, user modeling, and semantic matching to design the ontoulp and its application program. any user-learner could use the ontoulp and its application program. introduction more online learning environments are supporting the creation and dissemination of quality open educational resources (oer) to facilitate change in the education sector, improve education, ensure longlife learning, reduce cost, and other motives.1 in 2002, the united nations educational, scientific and cultural organization (unesco) recommended the definition of oer as follows: “the open provision of educational resources, enabled by information and communication technologies, for consultation, use and adaptation by a community of users for non-commercial purposes.”2 the william and flora hewlett foundation defined oer as “freely licensed, remixable learning resources—[they] offer a promising solution to the perennial challenge of delivering high levels of student learning at lower cost.”3 in 2012, unesco noted that oer offer education stakeholders an opportunity to access textbooks and other learning contents to enhance their knowledge and professional experiences.4 education stakeholders may choose oer based on their informational needs, behaviors, and preferences.5 we hope to contribute to the field of research in information technology and digital libraries by analyzing the connections between thematic digital universities and digital user-learner profiles. we are conducting a case study using the digital university engineering and technology.6 in the following we will explain these topics and the interest in the digital university engineering and technology. in 2003, the french ministry of higher education, research, and innovation initiated the creation of thematic digital universities to facilitate the integration and use of information and mailto:marilou_kordahi@yahoo.fr information technology and libraries june 2022 ontology for the user-learner profile | kordahi 2 communication technologies for education in university teaching practices.7 in total, there are six thematic digital universities which are organized by broad disciplines: health sciences and sports, engineering sciences, environment and sustainable development, humanities, economics and management, as well as technical studies. thematic digital universities are similar to digital libraries, and focus on creating and indexing oer, as well as improving learning in the information age.8 although thematic digital libraries are mostly comprised of oer, they also develop complete training programs with some of these resources (e.g., massive open online courses, or moocs). they are partners with canal-u, the video library for higher education, as well as the french national platform for massive open online courses (fun-mooc). thematic digital universities are mostly created for learners and teachers, as they offer complementary educational resources to bachelor, masters, and doctoral programs.9 to date, learners and teachers have free access to most thematic digital universities and corresponding educational resources. registration is not required; however, without registration neither the learner nor the teacher can analyze her/his search for oer based on informational behaviors, needs, and preferences.10 we will focus on the analysis of oer metadata records in the context of thematic digital universities. each oer in the repository holds a metadata record to precisely describe its specifications to the learner or teacher (e.g., the learning level, language, and topics). specifications are written according to the institute of electrical and electronics engineers (ieee) standards for learning object metadata (lom),11 lomfr, and suplomfr. lom provides an accurate descriptive schema of a learning object suitable for educational resources12 (e.g., the classification and identification of an educational resource). lomfr and suplomfr are currently applications of lom in the french educational community.13 the digital university engineering and technology attracted our attention because of the following characteristics: clear presentation of its objectives, regular information updates, priority for free access to oer and open data, 3,000 published educational resources, extensive documentation of oer indexing, interoperability of oer and metadata records, and an advanced search engine for oer. each metadata record describes precise information on the oer, including the main title, keywords, descriptive text, educational types (or resources), learning level, copyrights, knowledge domains, topics, authors, and publishers. it is processed and structured with xml language which is human-readable and machine-readable. digital user profiles relate to the digital representation of a person’s identity and characteristics.14 digital identity is the sum of digital traces (or “footprints”) relating to an individual or a community found on the web or in digital systems. digital traces correspond to the user’s profile, browsing history, and contribution actions.15 our focus is the learner who wishes to use the thematic digital universities for tailor-made analysis of retrieved information based on her/his needs and preferences. we offer the learner an option to register on these platforms to track behavior over time while searching for oer. analyses are based on criteria the learner has previously chosen to personalize this search. subsequently, we suggest using the term “digital user-learner profile.” we will do our best to respect the general data protection regulations when collecting information on the digital userlearner profile.16 the general data protection regulations are privacy laws drafted and passed by information technology and libraries june 2022 ontology for the user-learner profile | kordahi 3 the european union that prohibit the processing, storage, or sharing of certain types of information about individuals without their knowledge and consent. the research questions are as follows: 1. in the context of thematic digital universities, how can a user-learner personalize the search for open educational resources according to her/his digital profile? 2. in this same context, what kinds of information can a user-learner analyze in a search for open educational resources according to her/his digital profile? the objectives of this article are to present the preliminary results of work in progress on the design of the ontology for the digital user-learner profile (ontoulp) and its application program, the personalized modeling system for the user-learner profile (psul). we rely on the methods of ontology,17 user modeling,18 and semantic matching.19 the method of ontology is used to describe in a formal manner a set of concepts and objects which represent the meaning of an information system in a specific area and the relationships between these concepts and objects.20 the method of user modeling describes the process of designing and changing a user’s conceptual understanding. it is applied to customize and adjust systems to meet the user’s needs and preferences. the method of semantic matching is used to identify and relate a meaning concept (or class) to its homologous concept in tree-like schemas and to consider the concept’s position in these schemas (e.g., mapping a class in an ontology to homologous concepts in metadata records). this relationship can be a one-to-one concept or one-to-many concepts. the ontoulp is a first approach, and it will be used to structure a user-learner’s digital profile in the context of thematic digital universities. we design this ontology for three main reasons: to structure collected and generated information21 (e.g., structuring a user-learner’s learning preferences will enable the identification of learning behaviors and activities), to analyze collected and generated information22 (e.g., analyzing generated information by a user-learner may predict a search for oer), as well as to facilitate relationships between a user-learner and thematic digital universities23 (e.g., analyzing user-learner informational behaviors may improve oer creation and dissemination). the psul will be designed as an application program for the ontoulp. it will be used to provide each user-learner with tailor-made analyses based on informational behaviors, needs, and preferences. psul will include a secure database and web pages, namely those for registering and editing the user-learner profile and its dashboard.24 ontoulp and its application program will offer each registered user-learner an opportunity to analyze the search for oer according to informational behaviors and needs. ontoulp and psul could be implemented in the structure of information systems for educational and research institutions, documentation and information centers, and many others. we will finetune our analysis by relying on a case example—the thematic digital universities. this article comprises six sections. first, we will explain the exploratory research carried out in the context of thematic digital universities. second, we will present the main published works related to the subject of the article. third, we will explain the approach followed to design and write the ontoulp. fourth, we will discuss the creation of the psul application program. fifth, we will demonstrate the integration of the designed ontology and its application program into a information technology and libraries june 2022 ontology for the user-learner profile | kordahi 4 mirror site to perform a technical test. finally, we will discuss the completed work before concluding the article. exploratory research approach this exploratory research is based on an analysis of the literature, a semistructured questionnaire, and an in-depth documentary research. we check the consistency of collected information and identify the need to personalize the search for oer as well as make tailor-made analysis of information. methods used during the first 18 months of the covid-19 pandemic (november 2020–may 2021), we conducted qualitative research to deepen our comprehension of the practices of thematic digital universities. we collected and interpreted primary and secondary information. primary information: we contacted the digital university association and their six thematic digital universities.25 because of their extensive expertise and robust knowledge in leading or managing thematic digital universities, directors and general secretaries were chosen to selfadminister an electronic semistructured questionnaire. we contacted seven individuals and received six responses. in this questionnaire, we asked about the following topics: the recent knowledge of thematic digital universities, conditions of access to oer, metadata records indexing as well as user-learner’s expectations. an example of the questionnaire is included in the appendix. secondary information: we analyzed a report by the french general inspectorate of the national education and research administration. we have also studied recently-published scientific articles by anne boyer (2011), deborah arnold (2018),26 and sihem zghidi and mokhtar ben henda (2020). the results and findings will be explained in the following paragraphs. results of information collection we have compared responses to the questionnaire and contents of published documents and articles. for the digital university in health sciences and sports, “resources are mostly accessible to learners from member universities, through an identification system based on the university email address.”27 only a few resources are open to the public. otherwise, according to comments gathered from the other four digital universities and digital university association, “thematic digital universities are part of global movements providing access to oer by promoting open access to knowledge.”28 they are an opportunity for learners to discover new disciplines and explore new areas.29 in fact, “the process for indexing metadata records meets standards for education, such as lom, lomfr and suplomfr.”30 at present, there is no feedback on the use of thematic digital universities platforms. in other words, “thematic digital universities have no information about learners who view oer, because there is no login and password. this is done on purpose to make them as open as possible.”31 these platforms are considered as a means of selftraining with quality assurance, as the documents have been produced and validated by higher education teachers. “thematic digital universities provide a certain flexibility allowing learners to work when and where they want.”32 information technology and libraries june 2022 ontology for the user-learner profile | kordahi 5 findings five thematic digital universities and the digital university association responded to the semistructured questionnaire. two thematic digital universities can track user-learners’ behaviors. these digital universities are related to the disciplines of health and sport in addition to technical studies. to date, four thematic digital universities cannot track user-learners’ interactions based on informational behaviors and preferences. ontoulp and its application program could be implemented in four thematic digital universities, which are related to the disciplines of engineering sciences, environment and sustainable development, humanities, economics, and management. literature review to our best knowledge, published research works addressing this research subject are limited in the context of thematic digital universities.33 we analyze the most recent ontologies and user modeling systems that are close to our research objectives. the main works we use are those of bloom et al. (1984),34 smythe et al. (2001),35 green and panzer (2009),36 and kordahi (2020),37 in addition to kelly and belkin (2002). the work methods and field studies these researchers have developed are useful to design the structure of the ontoulp and the model of its application program. in the following paragraphs, we will explain these works and the relationships with this research article. selection of recently published works in 2020 and 2021, kordahi designed an ontology and a personalized dashboard for user learners.38 the objectives of these works were to track individual searches for oer and compare them with a user-learner’s field of work. to design her ontology, kordahi relied on standardized ontologies and validated taxonomies which are used in online learning environments, namely the ims learner information profile (ims lip)39 and bloom’s taxonomy. the personalized dashboard was linked to the user-learner ontology. the designed dashboard was tested technically with its ontology in a digital library environment to examine its performance. kordahi used the methods of ontologies and semantic matching. learner model we are mostly interested in the learner model40 as it “is a model of the knowledge, difficulties and misconceptions of the individual [learner].” 41 as students learn the educational resources they find, the learner model is updated to display their current progress. the model can continue to tailor students’ interactions as they learn. there are several learner models, such as the ims lip.42 we examine the ims lip, which is based on a standardized data model describing a learner’s characteristics. it is mainly used to manage a student’s learning history to discover her/his learning opportunities. ims lip is made from 11 categories that gather learning information: “the identification, goals, qualifications and licenses, activity, interest, competency, accessibility, transcript, affiliation, security, and relationships.”43 this model has been successfully used by many renowned researchers (e.g., paquette 201044) to design a learner model and then adapt it to appropriate contexts. ims lip’s reliability, accuracy, and flexibility match well with the ontoulp motives. we will use it to begin designing the structure of the ontoulp and adapt it to the thematic digital universities context. we will also consider the ieee lom, lomfr, and suplomfr classification fields. this measure will be used to improve semantic matching between the ontoulp and oer metadata records. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 6 taxonomy of educational objectives we examine the user-learner’s educational objectives to meet informational needs and expectations.45 in each oer metadata record, educational objectives are defined based on bloom’s taxonomy (e.g., “understand the context and rules of scientific publication” 46). bloom et al. have developed a taxonomy for educational objectives to classify statements teachers expected students to learn as a result of lessons and instructions. the researchers described a method for allowing students to achieve educational goals while carrying out exercises utilizing the resources of the environment. bloom et al. relied on in-depth qualitative studies to design and validate this taxonomy. bloom’s taxonomy contains the following six major categories related to the cognitive domain: knowledge, comprehension, application, analysis, synthesis, and evaluation. this taxonomy was revised in 2001 by lorin anderson et al.47 bloom’s taxonomy is still in use internationally as in the works of kordahi. integrating bloom’s taxonomy into the ontoulp will enhance the structure of a user-learner’s educational objectives. these educational objectives will be organized in six categories allowing the user-learner to refine her/his informational goals. therefore, we will create a mutual link between the user-learner’s educational objectives and oer educational objectives. knowledge domains knowledge organization systems48 are seen as a valuable component for searching for oer.49 our research includes analyses of oer metadata records to establish relationships between their knowledge topics and the user-learner’s topics of interest. in the thematic digital universities’ metadata records, a precise classification is reported respecting both knowledge topics and dewey decimal classification (e.g., geographic information systems (526.028 5)). 50 the dewey decimal classification and relative index 22nd edition,51 published in 2003 by the online computer library center,52 is being used worldwide in digital libraries and by the thematic digital universities.53 in their works published in 2009, green and panzer have developed an ontology to structure knowledge domains.54 this ontology recognizes two classes, which are dewey classes and knowledge topics. we selected the dewey decimal classification for the ontoulp because the thematic digital universities are already using it. we will rely on green and panzer’s ontology to structure the knowledge domains in the ontoulp (e.g., the use of dewey classes and knowledge topics). we will establish relationships between the knowledge domains and user-learner model, allowing the user-learner to choose the most appropriate learning topics. user modeling system the “user modeling system for personalized interaction and tailored retrieval” is useful for analyzing each user-learner’s informational needs and preferences.55 kelly and belkin’s system helps the user to track informational needs over time. it contains three classes of models and a set of interactions. the “general behavioral model” tracks information seeking and user behavior to determine informational needs. the “personal behavioral model” characterizes each user’s information search according to specific preferences and behaviors. the “topical models” are associated with concepts related to each user’s informational behaviors. this model is developed by renowned researchers specialized in information retrieval and corresponds to the objectives of the research article. we will use the structure of kelly and belkin’s model (2002) to design the psul application program, in the context of thematic digital information technology and libraries june 2022 ontology for the user-learner profile | kordahi 7 universities. relationships between both the psul and ontoulp ontology will be established to carry out personalized analyses of oer search. ontoulp ontology ontoulp’s design is based on the works discussed in the previous section. it consists of two stages. we start by writing it. we then describe the ontology and emphasize the relationships between different entities. writing the ontology we write ontoulp with protégé editor and use the hermit inference engine to check the consistency of classes and their relationships with objects. the ontology’s first approach is saved in owl format, which is compliant with the semantic web technologies. ontoulp description the ontology is comprised of five subsystems. these are: user-learner, user-learner model, educational objectives, learning design, and knowledge domains. each subsystem is composed of classes that inherit the attributes of the subsystem on which they depend. for brevity, the figures show the hierarchical representation of these subsystems. the user-learner subsystem contains all recorded private information on the digital user-learner profile. the classes personal information, identification sessions, and traces provide information about the user-learner’s behavior and search history for oer, e.g., the search duration for oer (see fig. 1). the user-learner model subsystem is responsible for structuring collected information related to learning behaviors and needs, namely the classes identification, interest, learning level (or qualifications and licenses), personal preferences (or accessibility), activities, learning objectives (or goals), affiliation, and network of contacts (or relationships). in the context of thematic digital universities, the resulting subsystem is composed of eight classes instead of eleven. the userlearner model subsystem conveys the structured information to the user-learner subsystem. figure 1 shows the structure of both subsystems, the user-learner and user-learner model. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 8 figure 1. hierarchical representation of both subsystems, the user-learner and user-learner model. the educational objectives subsystem includes cognitive objectives involved in the process of acquiring knowledge. we design their structure by adapting bloom’s taxonomy. the cognitive objectives class includes six interrelated subclasses: remember (or knowledge), understand (or comprehension), apply (or application), analyze (or analysis), synthetize (or synthesis), and evaluate (or evaluation). the cognitive objectives class is enhanced with the ieee lom, lomfr, and suplomfr classification fields enabling the user-learner to choose objectives which best describe their needs and preferences, e.g., the class apply has subclasses design, choose (see fig. 2). information technology and libraries june 2022 ontology for the user-learner profile | kordahi 9 figure 2. hierarchical representation of educational objectives and learning design subsystems. the learning design subsystem is an adaptation of the ims learning design model, in the context of thematic digital universities.56 the learning design subsystem has two main classes: the userlearner’s environment and learning activities. the environment class has six thematic digital universities as subclasses. in a general manner, information about the environment class comes from thematic digital universities platforms (e.g., the viewed metadata records). the learning activities class has resources as a subclass. the resources subclass is also enriched with the ieee lom, lomfr, and suplomfr classification fields to complete its structure and meet the userlearner’s needs and expectations. further, we have connected the learning activities with cognitive objectives classes to ensure continuity between them (e.g., the subclass experimentation is associated with subclass analyze). figure 2 illustrates the main structure of both subsystems, the learning objectives and learning design. the knowledge domains subsystem contains the main class dewey decimal classification and class contacts. this main class has two subclasses: dewey classes, with the corresponding divisions as subclasses, and knowledge topics, with the corresponding subtopics as subclasses (e.g., science topic corresponds to dewey class 500, manufacturing subtopic corresponds to division 670). information technology and libraries june 2022 ontology for the user-learner profile | kordahi 10 figure 3. hierarchical representation of the subsystem knowledge domains. the subclass knowledge topics is related to the subclass user-learner’s learning topics to improve informational behavior analyses. the class contacts is linked to the subclass user-learner’s network of contacts to analyze the strength or weakness of networks between the user-learner and oer publishers/authors (see fig. 1). the subsystem knowledge domains can deal with questions which belong to different levels in the ontoulp. for example, which learning topics is the user-learner looking for? which network of contacts is the user-learner interested in? what are the activities related to the user-learner learning topics? what keywords searched relate to the user-learner’s learning topics?57 in figure 3, we show some of the subsystem’s elements. personalized modeling system for the user-learner profile the psul is based on the works discussed in the previous sections. it is written with php, javascript, and xml, computing languages for the web. this new modeling system comprises three classes of models: the general behavioral, personal behavioral, and topical (see fig. 4). the general behavioral model has two roles. it registers a user-learner’s digital profile in order to determine informational needs and preferences for oer. it also collects informational behaviors of a user-learner while viewing oer metadata records for tailor-made analyses. the general information technology and libraries june 2022 ontology for the user-learner profile | kordahi 11 behavioral model includes the ontology ontoulp as well as user-learner registration and editing pages. the registration page contains relevant information about a user-learner, an option to accept or reject data collection, and a list of choices for behavioral analyses. once registered, the user-learner can modify her/his profile from the editing page. both pages are mapped to the ontoulp to populate criteria fields. the user-learner profile information is stored in a secure database (as described in the introduction). the personal behavioral model is used to analyze information according to the registered digital user-learner profile and informational behaviors. it contains a set of queries to collect and tailor information for each user-learner. the sources of information are the general behavioral model and oer metadata records. this model is designed based on analyses of the general behavioral model. when a user-learner begins searching for oer, the general behavioral model provides the personal behavioral model with all profile information as well as the history of oer search. this information is transmitted to make an adjustment to the personalized user-learner profile. the user-learner profile changes as the personal behavioral model receives more information from the general behavioral model. informational interactions connect the personal behavioral model to topical models. the topical models bring together all analyses of oer search for each user-learner.58 they are inferred from the personal behavioral model. informational interactions connect the topical models to the general behavioral model. for now, we have designed four topical models and present their outcome in the user-learner dashboard page. this page may be used as a practical dashboard providing feedback to each user-learner, who can use these analyses to adjust or make changes in the profile or the oer search. topical model 1 is used to synthesize each user-learner’s search history and to suggest a profile adjustment. the suggested adjustment is based on analyses of user-learner behavioral trends.59 topical model 2 allows each user-learner to examine the list of knowledge topics which have caught her/his attention. it contains two separate lists describing viewed oer metadata records and matching them to the chosen topics of interest. topical model 3 shows comparative analyses between the user-learner’s preference criteria and viewed metadata records. the user-learner can interact with this model by comparing the chosen topics of interest to the viewed knowledge topics. the user-learner can also compare the chosen learning activities to the viewed teaching pedagogies. the teaching pedagogies as well as knowledge topics are extracted from oer metadata records (see fig. 5a). topical model 4 highlights each user-learner’s interest based on the keyword search volume. the user-learner can interact with this model by studying the relationships between searched keywords and chosen topics of interest (see fig. 5a and fig. 5b). figure 4 shows the diagram of psul as explained in the paragraph. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 12 figure 4. the psul diagram based on the kelly and belkin’s system (2002).60 ontoulp and its modeling system in the context of a thematic digital university for now, ontoulp and its application program are implemented in the digital university engineering and technology private platform which is hosted on a private server. we conducted a technical test to mainly assess ontoulp’s precision and performance. the digital university’s team has sent us a complete archive of their oer metadata records. these oer metadata records are saved on the private server with the digital university engineering and technology platform. once a user-learner is registered to this platform, she/he can carry out actions through the psul. for example, these actions are a search by keyword, personalization of profile, tailored-made analysis of oer search, and visualization of analyses in the dashboard. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 13 figure 5a. screenshot of a section of the dashboard. the bar chart shows comparative analyses between a user-learner’s topic of interest and knowledge topics. the knowledge topics are extracted from the viewed oer metadata records. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 14 figure 5b. screenshot of a section of the dashboard. the pie chart highlights a user-learner’s interest based on a keyword search volume. the bar chart shows comparative analyses between a user-learner learning activities and viewed teaching pedagogies. the keywords are extracted from the search. the teaching pedagogies are extracted from oer metadata records. to avoid making the article longer, in figures 5a and 5b, we show brief results of a technical test. in this example, the user-learner’s identity is fictitious, or the user-learner’s persona is a construct.61 information technology and libraries june 2022 ontology for the user-learner profile | kordahi 15 in other words, the user-learner’s identity is not real, it is fabricated to conduct and complete the technical test. when registering, this user-learner has selected the technology topic (dewey class 600) in addition to the management and public relations subtopic (dewey division 650). this userlearner has also selected all topical models. during a viewing session, this user-learner chose to search for oer while using a few keywords. the keywords were chosen according to the userlearner’s profile and in order to continue the technical test. discussion and conclusion the ontology for the digital user-learner profile is a first approach based on the semantic web. it is designed for the personalization of interactions and retrieval of tailored information. we have combined standardized and validated resources, such as the ims lip, bloom’s taxonomy, and knowledge domains ontology, to allow the user-learner’s search analyses. we have discussed the design of a new application program prototype allowing a user-learner to analyze the search for oer according to her/his digital profile. psul provides automated real-time feedback based on the user-learner’s search history and information she/he has inserted about herself/himself. we have then demonstrated the integration of the ontoulp and psul into a mirror site to perform a technical test. the ontology’s main characteristics are flexibility and adaptability. while designing ontoulp, we have reused or restructured resources to allow its use in other thematic digital universities and online learning environments, including digital libraries. another advantage of ontoulp is the application of several information processing techniques. for example, a registered user-learner can self-assess her/his search for oer by keywords. she/he can also analyze the relevance of the search for oer through the psul. we have successfully overcome three essential limitations. the first limitation concerns the literature on the subject (see literature review section). while contributing to the field of research in information technology and digital libraries, this work has also drawn on disciplines as diverse as those of education as well as cognitive, social, and human sciences. the terminological definitions of disciplines, concepts, and even methods vary over decades or centuries, and among groups of researchers. we have made every effort to define the different terms correctly and to cite the corresponding researchers. the second difficulty relates to the design of ontoulp. published works dealing with this topic are rare. we used an exploratory research approach and the published works of renowned international researchers to fine-tune our study (see the exploratory research approach and literature review sections). we then determined the classes and objects as well as relationships between them. the third constraint concerns the design of the psul by following the thematic digital universities policies and respecting the general data protection regulations. according to the regulations, we have opted for an optional registration to thematic digital universities and to collecting information on the digital user-learner profile. thus, the user-learner will always have the possibility of registering to these platforms to make a tailor-made information analysis according to the digital profile. as we conclude our work, we have a plan to focus our research and initiatives in the following areas. firstly, we will further deepen our study of ontoulp classes to further increase their precision. we will also examine the search personalization of oer based on uses and practices of algorithms in the ontoulp.62 for example, by relying on newer version of the ontology we will identify the topics of interest, which may interest a specific user-learner. we will implement this information technology and libraries june 2022 ontology for the user-learner profile | kordahi 16 newer version in some thematic digital universities to perform technical tests. secondly, we will conduct qualitative and quantitative studies to analyze participants’ behavior while using ontoulp and its application program, in the context of thematic digital universities. for example, we will examine how many participants would choose to use the ontoulp and psul as well as how many wouldn’t (e.g., the usefulness of ontologies to participants). we will analyze the behavior of individuals with digital personae and make connections between their searches for oer.63 we will study their profiles, behaviors, and interests to ultimately suggest oer (e.g., the use of recommendation systems). we will also analyze how participants’ behavior and feedback may affect future findings. participants would be previously selected to contribute to these studies. thirdly, we will study the effects of ontoulp and psul practices on the thematic digital universities. this study will concern an analysis of the thematic digital universities’ search engines and users-learners’ needs. for example, exploratory research will allow us to better understand user-learners’ informational needs and expectations when using the oer search engines. we will analyze the design of oer search engines considering these informational needs and expectations. we will then utilize and integrate these findings to suggest alternatives to the thematic digital universities to further improve these search engines. acknowledgments we thank the digital university association and thematic digital universities for their elaborate and enlightening explanations concerning the platforms. we thank the reviewers and claude baltz, emeritus professor in information and communication sciences at the paris 8 university, for carefully reviewing this article and for enriching it with their expert observations. thanks to mohammad hajj hussein, communication and it engineer, for his help programming the dashboard. information technology and libraries june 2022 ontology for the user-learner profile | kordahi 17 appendix: semistructured questionnaire example email subject: digital university engineering and technology dear sir, madam, i am affiliated to the paragraph research laboratory at the paris 8 university (laboratoire de recherche paragraphe, université paris 8). i am writing to you to gather further information concerning the digital university engineering and technology. the objective of the semistructured questionnaire is to deepen my comprehension of the practices of digital university engineering and technology in order to write a research article and contribute to its improvement. i would be grateful if you could answer the following questions: • what are your responsibilities at the digital university engineering and technology? • do the thematic digital universities as well as digital university engineering and technology provide “open” educational resources? • are the educational resources accessible only to students enrolled in the training programs of partner universities? • how is the access to educational resources made? • do the educational resources follow document processing for their indexing? • is the document processing specific to the thematic digital universities? • what are the expectations of “users” searching for educational resources? thank you in anticipation sincerely yours, marilou kordahi information technology and libraries june 2022 ontology for the user-learner profile | kordahi 18 endnotes 1 “cape town open education declaration: unlocking the promise of open educational resources,” 2007, http://www.capetowndeclaration.org/read-the-declaration. 2 unesco, “forum on the impact of open courseware for higher education in developing countries,” (2002): 24, http://unesdoc.unesco.org/images/0012/001285/128515e.pdf. 3 william and flora hewlett foundation, “open education,” accessed april 5, 2022, https://hewlett.org/strategy/open-education. 4 unesco, “2012 paris oer declaration,” 2012, http://www.unesco.org/new/fileadmin/multimedia/hq/ci/wpfd2009/english_declaratio n.html. 5 camille thomas, kimberly vardeman, and jingjing wu, “user experience testing in the open textbook adaptation workflow,” information technology and libraries journal 40, no. 1 (2021): 1–18, https://doi.org/10.6017/ital.v40i1.12039. 6 digital university engineering and technology, “open educational resources for engineering and technology,” accessed april 5, 2022, https://unit.eu. 7 jean delpech de saint guilhem, sonia dubourg-lavroff, and jean-yves de longueau, “thematic digital universities,” general inspectorate of the national education and research administration, 2016, https://www.enseignementsuprecherche.gouv.fr/cid104387/www.enseignementsup-recherche.gouv.fr/cid104387/lesuniversites-numeriques-thematiques.html. 8 asim ullah, shah khusro, and irfan ullah, “bibliographic classification in the digital age: current trends & future directions,” information technology and libraries 36, no. 3 (2017): 48–77, https://doi.org/10.6017/ital.v36i3.8930; anne boyer, “thematic digital universities: report,” sciences et technologies de l'information et de la communication pour l'éducation et la formation 18, no. 1 (2011): 39–52. 9 sihem zghidi and mokhtar ben henda, “open educational resources and open archives in the open access movement: an educational engineering and scientific research crossed analysis,” distances and mediations of knowledge 31 (2020), https://doi.org/10.4000/dms.5347. 10 diane kelly and nicholas j. belkin, “a user modeling system for personalized interaction and tailored retrieval in interactive ir,” proceedings of the american society for information science and technology 39, no. 1 (2002): 316–25, https://doi.org/10.1002/meet.1450390135. 11 ieee learning technology standards committee, “learning object metadata, final draft standard, 1484.12.1-2002,” http://ltsc.ieee.org/wg12. 12 gregory m. shreve, and marcia lei zeng, “integrating resource metadata and domain markup in an nsdl collection,” in international conference on dublin core and metadata applications (2003): 223–29. http://www.capetowndeclaration.org/read-the-declaration http://unesdoc.unesco.org/images/0012/001285/128515e.pdf https://hewlett.org/strategy/open-education http://www.unesco.org/new/fileadmin/multimedia/hq/ci/wpfd2009/english_declaration.html http://www.unesco.org/new/fileadmin/multimedia/hq/ci/wpfd2009/english_declaration.html https://doi.org/10.6017/ital.v40i1.12039 https://www.enseignementsup-recherche.gouv.fr/cid104387/www.enseignementsup-recherche.gouv.fr/cid104387/les-universites-numeriques-thematiques.html https://www.enseignementsup-recherche.gouv.fr/cid104387/www.enseignementsup-recherche.gouv.fr/cid104387/les-universites-numeriques-thematiques.html https://www.enseignementsup-recherche.gouv.fr/cid104387/www.enseignementsup-recherche.gouv.fr/cid104387/les-universites-numeriques-thematiques.html https://doi.org/10.6017/ital.v36i3.8930 https://doi.org/10.4000/dms.5347 https://doi.org/10.1002/meet.1450390135 http://ltsc.ieee.org/wg12 information technology and libraries june 2022 ontology for the user-learner profile | kordahi 19 13 french standardization association, “description standard for the field of education in france – – part 1: description of learning resources (nodefr-1), nf z76-041,” 2019. 14 arthur allison, james currall, michael moss, and susan stuart, “digital identity matters,” journal of the american society for information science and technology 56, no. 4 (2005): 364–72, https://doi.org/10.1002/asi.20112. 15 katalin feher, “digital identity and the online self: footprint strategies – an exploratory and comparative research study.” journal of information science 47, no. 2 (2021): 192–205. https://doi.org/10.1177/0165551519879702. 16 robyn caplan and danah boyd, “who controls the public sphere in an era of algorithms,” mediation, automation, power (2016), https://www.datasociety.net/pubs/ap/mediationautomationpower_2016.pdf. 17 thomas r. gruber, “a translation approach to portable ontology specifications,” knowledge acquisition 5, no. 2 (1993): 199–220, https://doi.org/10.1006/knac.1993.1008. 18 gerhard fischer, “user modeling in human–computer interaction,” user modeling and useradapted interaction 11, no. 1 (2001): 65–86, https://doi.org/10.1023/a:1011145532042. 19 yannia kalfoglou and marco schorlemmer, “ontology mapping: the state of the art,” the knowledge engineering review 18, no. 1 (2003): 1–31, https://doi.org/10.1017/s0269888903000651. 20 tom gruber, “collective knowledge systems: where the social web meets the semantic web,” web semantics: science, services and agents on the world wide web 6 no. 1 (2008): 4–13, https://doi.org/10.1016/j.websem.2007.11.011. 21 peter ingwersen, “search procedures in the library – analysed from the cognitive point of view,” journal of documentation 38, no. 3 (1982): 165–97, https://doi.org/10.1108/eb026727. 22 tefko saracevic, amanda spink, and mei-mei wu, “users and intermediaries in information retrieval: what are they talking about?” in user modeling: proceedings of the sixth international conference (vienna: springer, 1997): 43–54. 23 núria ferran, enric mor, and julià minguillón, “towards personalization in digital libraries through ontologies,” library management 26, no. 4/5 (2005): 206–17. https://doi.org/10.1108/01435120510596062. 24 katrien verbert, erik duval, joris klerkx, sten govaerts, and josé luis santos, “learning analytics dashboard applications,” american behavioral scientist 57, no. 10 (2013): 1500– 1509, https://doi.org/10.1177/0002764213479363. 25 digital university association, “open educational resources for all,” accessed april 5, 2022, https://univ-numerique.fr. https://doi.org/10.1002/asi.20112 https://doi.org/10.1177/0165551519879702 https://www.datasociety.net/pubs/ap/mediationautomationpower_2016.pdf https://doi.org/10.1006/knac.1993.1008 https://doi.org/10.1023/a:1011145532042 https://doi.org/10.1017/s0269888903000651 https://doi.org/10.1016/j.websem.2007.11.011 https://doi.org/10.1108/eb026727 https://doi.org/10.1108/01435120510596062 https://doi.org/10.1177/0002764213479363 https://univ-numerique.fr/ information technology and libraries june 2022 ontology for the user-learner profile | kordahi 20 26 deborah arnold, “the french thematic digital universities – a 360° perspective on open and digital learning,” in european distance and e-learning network conference proceedings, no. 1 (2018): 370–78. 27 director of the digital university in health and sport messaged author, may 3, 2021. 28 director of the virtual university of environment and sustainable development messaged author, january 6, 2021. 29 director of the digital university in economics and management messaged author, december 08, 2020. 30 general secretary of the open university of the humanities messaged author, may 1, 2021. 31 member of digital university association messaged author, december 18, 2020. 32 director of the digital university engineering and technology messaged author, december 11, 2020. 33 laecio araujo costa, leandro manuel pereira sanches, ricardo josé rocha amorim, laís do nascimento salvador, and marlo vieira dos santos souza, “monitoring academic performance based on learning analytics and ontology: a systematic review,” informatics in education 19, no. 3 (2020): 361–97. 34 benjamin s. bloom, david r. krathwohl, and bertram b. masia, taxonomy of educational objectives: the classification of educational goals (new york: longman, 1984). 35 colin smythe, frank tansey, and robby robson, “ims learner information package. best practice & implementation guide,” ims global learning consortium, 2001. 36 rebecca green and michael panzer, “the ontological character of classes in the dewey decimal classification,” the library, (2009), https://www.ergonverlag.de/isko_ko/downloads/aiko_vol_12_2010_25.pdf 37 marilou kordahi, «le changement de l’apprentissage, l’ontologie du profil de l’utilisateurapprenant, » management des technologies organisationnelles, 10 (2020): 73–88. 38 marilou kordahi, “information literacy: ontology structures user-learner profile in online learning environment,” in seventh european conference on information literacy, (2021): 130, http://ecil2021.ilconf.org/wpcontent/uploads/sites/9/2021/09/ecil2021_book_of_abstracts_final_v3.pdf#page=149. 39 “ims learner information package accessibility for lip best practice and implementation guide,” ims global learning consortium, last revised june 18, 2003, https://www.imsglobal.org/accessibility/acclipv1p0/imsacclip_bestv1p0.html. 40 judy kay, “learner know thyself: student models to give learner control and responsibility,” in proceedings of international conference on computers in education (1997): 17–24. https://www.ergon-verlag.de/isko_ko/downloads/aiko_vol_12_2010_25.pdf https://www.ergon-verlag.de/isko_ko/downloads/aiko_vol_12_2010_25.pdf http://ecil2021.ilconf.org/wp-content/uploads/sites/9/2021/09/ecil2021_book_of_abstracts_final_v3.pdf#page=149 http://ecil2021.ilconf.org/wp-content/uploads/sites/9/2021/09/ecil2021_book_of_abstracts_final_v3.pdf#page=149 https://www.imsglobal.org/accessibility/acclipv1p0/imsacclip_bestv1p0.html information technology and libraries june 2022 ontology for the user-learner profile | kordahi 21 41 susan bull, “supporting learning with open learner models,” in proceedings of 4th hellenic conference with international participation information and communication technologies in education (2004): 47–61. 42 peter dolog and wolfgang nejdl, “challenges and benefits of the semantic web for user modelling,” in proceedings of the workshop on adaptive hypermedia and adaptive web-based systems (ah2003) at 12th international world wide web conference (2003). 43 ims global learning consortium, “ims learner information package accessibility for lip best practice and implementation guide,” para. 2. 44 gilbert paquette, “ontology-based educational modelling-making ims-ld visual,” technology, instruction, cognition & learning 7, no. 3–4 (2010): 263–93. 45 john seely brown and richard p. adler, “open education, the long tail, and learning 2.0,” educause review 43, no. 1 (2008): 16–20. 46 open university of the humanities, “how to write and publish a scientific article,” accessed on april 5, 2022, https://uoh.fr/front/noticefr/?uuid=6a063dd7-3a02-482a-9857934501f7c82d. 47 lorin w. anderson, david r. krathwohl, peter w. airiasian, kathleen a. cruikshank, richard e. mayer, paul r. pintrich, james raths, and merlin c. wittrock. a taxonomy for learning, teaching and assessing: a revision of bloom’s taxonomy of educational objectives (new york: longman publishing group, 2001). 48 birger hjørland, “theories are knowledge organizing systems (kos).” knowledge organization 42, no. 2 (2017): 113–28, https://doi.org/10.5771/0943-7444-2015-2-113. 49 walter moreira and daniel martínez-ávila, “concept relationships in knowledge organization systems: elements for analysis and common research among fields,” cataloging & classification quarterly 56, no. 1 (2018): 19–39, https://doi.org/10.1080/01639374.2017.1357157. 50 wayne a. wiegand, “the ‘amherst method’: the origins of the dewey decimal classification scheme.” libraries & culture 33, no. 2 (1998): 175–94. 51 melvil dewey, dewey decimal classification and relative index, ed. joan s. mitchell, julianne beall, giles martin, and winton e. matthews, 22nd ed., (dublin, ohio: oclc, 2003). 52 joan s. mitchell, “ddc 22: dewey in the world, the world in dewey,” advances in knowledge organization 9 (2004): 139–45. 53 hamid saeed and abdus sattar chaudhry, “using dewey decimal classification scheme (ddc) for building taxonomies for knowledge organisation,” journal of documentation 58, no. 5 (2002): 575–83. 54 rebecca green and michael panzer, “the interplay of big data, worldcat, and dewey,” advances in classification research online 24, no. 1 (2013): 51–58. https://doi.org/10.5771/0943-7444-2015-2-113 https://doi.org/10.1080/01639374.2017.1357157 information technology and libraries june 2022 ontology for the user-learner profile | kordahi 22 55 kelly and belkin, “a user modeling system,” 319. 56 rob koper and colin tattersall, eds., learning design: a handbook on modelling and delivering networked education and training (heidelberg: springer science and business media, 2005). 57 david beer, “envisioning the power of data analytics,” information, communication & society 21, no. 3 (2018): 465–79, https://doi.org/10.1080/1369118x.2017.1289232. 58 charles lang, george siemens, alyssa wise, and dragan gasevic, eds., handbook of learning analytics (society for learning analytics and research, 2017), https://doi.org/10.18608/hla17. 59 joris klerkx, katrien verbert, and erik duval, “learning analytics dashboards,” in handbook of learning analytics, ed. charles lang, george siemens, alyssa wise, and dragan gasevic, (society for learning analytics and research, 2017), https://doi.org/10.18608/hla17, 143–50. 60 kelly and belkin, “a user modeling system,” 319. 61 roger clarke, “the digital persona and its application to data surveillance,” the information society 10, no. 2 (1994): 77–92, https://doi.org/10.1080/01972243.1994.9960160. 62 ahu sieg, bamshad mobasher, and robin burke, “web search personalization with ontological user profiles,” in proceedings of the sixteenth acm conference on conference on information and knowledge management (2007): 525–34, https://doi.org/10.1145/1321440.1321515. 63 roger clarke, “persona missing, feared drowned: the digital persona concept, two decades later,” information technology & people 27, no. 2 (2014): 182–207, https://doi.org/10.1108/itp-04-2013-0073. https://doi.org/10.1080/1369118x.2017.1289232 https://doi.org/10.18608/hla17 https://doi.org/10.18608/hla17 https://doi.org/10.1080/01972243.1994.9960160 https://doi.org/10.1145/1321440.1321515 https://doi.org/10.1108/itp-04-2013-0073 abstract introduction exploratory research approach methods used results of information collection findings literature review selection of recently published works learner model taxonomy of educational objectives knowledge domains user modeling system ontoulp ontology writing the ontology ontoulp description personalized modeling system for the user-learner profile ontoulp and its modeling system in the context of a thematic digital university discussion and conclusion acknowledgments appendix: semistructured questionnaire example endnotes article explainable artificial intelligence (xai) adoption and advocacy michael ridley information technology and libraries | june 2022 https://doi.org/10.6017/ital.v41i2.14683 michael ridley (mridley@uoguelph.ca) is librarian, university of guelph. © 2022. abstract the field of explainable artificial intelligence (xai) advances techniques, processes, and strategies that provide explanations for the predictions, recommendations, and decisions of opaque and complex machine learning systems. increasingly academic libraries are providing library users with systems, services, and collections created and delivered by machine learning. academic libraries should adopt xai as a tool set to verify and validate these resources, and advocate for public policy regarding xai that serves libraries, the academy, and the public interest. introduction explainable artificial intelligence (xai) is a subfield of artificial intelligence (ai) that provides explanations for the predictions, recommendations, and decisions of intelligent systems.1 machine learning is rapidly becoming an integral part of academic libraries. xai is a set of techniques, processes, and strategies that libraries should adopt and advocate for to ensure that machine learning appropriately serves librarianship, the academy, and the public interest. knowingly or not, libraries acquire and provide access to systems, services, and collections infused and directed by machine learning methods, and library users are engaged in information behavior (e.g., seeking, using, managing) facilitated or augmented by machine learning. machine learning in library and information science (lis), as with many other fields, has become ubiquitous. however, this technology is often opaque and complex, yet consequential. there are significant concerns about bias, unfairness, and veracity.2 there are troubling questions about user agency and power imbalances.3 while lis has a long-standing interest in ai and intelligent information systems generally, 4 it has only recently turned its attention to xai and how it affects the field and how the field might influence it.5 xai is a critical lens through which to view machine learning in libraries. it is also a set of techniques, processes, and strategies essential to influencing and shaping this stil l emerging technology: research libraries have a unique and important opportunity to shape the development, deployment, and use of intelligent systems in a manner consistent with the values of scholarship and librarianship. the area of explainable artificial intelligence is only one component of this, but in many ways, it may be the most important.6 dismissing engagement with xai because it is “highly technical and impenetrable to those outside that community” is neither acceptable nor increasingly possible.7 artificial intelligence is the essential substrate of contemporary information systems and xai is a tool set for critical assessment and accountability. the details matter and must be understood if libraries are to have a place at the table as xai, and machine learning, evolves and further deepens its effect on lis. mailto:mridley@uoguelph.ca information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 2 this paper provides an overview of xai with key definitions, a historical context, and examples of xai techniques, strategies, and processes that form the basis of the field. it considers areas where xai and academic libraries intersect. the dual emphasis is on xai as a toolset for libraries to adopt and xai as an area for public policy advocacy. what is xai? xai is plagued by definitional problems.8 some definitions are focused solely and narrowly on the technical concepts while others focus only on the broad social and political dimensions. lacking “a theory of explainable ai, with a formal and universally agreed definition of what explanations are,”9 the fundamentals of this field are still being explored, often from different disciplinary perspectives.10 critical algorithm studies position machine learning as socio-techno-informational systems.11 as such, a definition of xai must encompass not just the techniques, as important and necessary as they are, but also the context within which xai operates. the us defense advanced research projects agency (darpa) description of xai captures the breadth and scope of the field. the purpose of xai is for ai systems to have “the ability to explain their rationale, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future” 12 and to “enable human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners.”13 xai is needed to: 1. generate trust, transparency, and understanding; 2. ensure compliance with regulations and legislation; 3. mitigate risk; 4. generate accountable, reliable, and sound models for justification; 5. minimize or mitigate bias, unfairness, and misinterpretation in model performance and interpretation; and 6. validate models and validate explanations generated by xai.14 xai consists of testable and unambiguous proofs, various verification and validation methods that assess influence and veracity, and authorizations that define requirements or mandate auditing within a public policy framework. xai is not a new consideration. explainability has been a preoccupation of computer science since the early days of expert systems in the late twentieth century.15 however, the 2018 introduction of the general data protection regulation (gdpr) by the european union (eu) shifted explainability from a purely technical issue to one with an additional and urgent focus on public policy.16 while the presence of a “right to explanation” in the gdpr is highly contested, 17 industry groups and jurisdictions beyond the eu recognized its evitability spurring an explosion in xai research and development.18 types of xai taxonomies of xai types are classified based on their scope and mechanism.19 local explanations interpret the decisions of a machine learning model used in a specific instance (i.e., involving data and context relevant to the circumstance). global explanations interpret the model more generally (i.e., involving all the training data and relevant contexts). in black-box or model-agnostic explanations, only the input and the output of the machine learning model are required while information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 3 white-box or model-specific explanations require more detailed information regarding the processing or design of the model. another way to categorize xai is as proofs, validations, and authorizations. proofs are testable, traceable, and unambiguous explanations demonstrable through causal links, logic statements, or transparent processes. typically, proofs are only available for ai systems that use “inherently interpretable” techniques such as rules, decisions trees, or linear regressions.20 validations are explanations that confirm the veracity of the ai system. these verifications occur through testing procedures, reproducibility, approximations and abstractions, and justifications. authorizations are explanations because of processes in which third parties provide some form of standard, ratification, prohibition, or audit. authorizations might pertain to the ai model, its operation in specific instances, or even the process by which the ai was created. they can be provided by professional groups, nongovernmental organizations, governments and government agencies, and third parties in the public and private sector. academic libraries can adopt proofs and validations as means to interrogate information systems and resources. this includes collections which are increasingly machine learning systems themselves or developed with machine learning methods. the recognition of “collections as data” is an important shift in this direction.21 where appropriate, proofs and validations should accompany content and systems derived from machine learning. libraries must also engage with xai as authorizations to assess the public policy implications that exist, are emergent, or are necessary. library advocacy is currently lacking in this area. the requirement for policy and governance frameworks is a reminder that machine learning is “far from being purely mechanistic, it is deeply, inescapably human”22 and that while complex and opaque “the ‘black box’ is full of people.”23 prerequisites to an xai strategy three questions are important for any xai strategy: • what constitutes a good explanation? • who is the explanation for? • how will the explanation be provided? explanations are context specific. the “goodness” of an explanation is dependent on the needs and objectives of the explainee (a user) and the explainer (an xai). following research from the fields of psychology and cognitive science, keil suggests five reasons for why someone wants an explanation: (1) to predict similar events in the future, (2) to diagnose, (3) to assess blame or guilt, (4) to justify or rationalize an action, and (5) for aesthetic pleasure.24 for most people, explanations need not be complete or even fully accurate.25 as a result, who the explanation is for is critical to a good explanation. different audiences have different priorities. system developers are primarily interested in performance explanations while clients focus on effectiveness or efficacy, professionals are concerned about veracity, and regulators are interested in policy implications. nonexpert, lay users of a system want explanations that build trust and provide accountability. information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 4 a good explanation is also affected by its presentation. there are temporal and format considerations. explanations can be provided or available in real time and continuously as the process occurs (hence partial explanations) or post hoc and in summary form. interactive explanations are widely preferred but are not always appropriate or actionable. 26 studies have compared textual, visual, and multimodal formats with differing results. familiar textual responses or simple visual explanations such as venn diagrams are often most effective for nonexpert users.27 drawing from philosophy, psychology, and cognitive science, miller recommends four approach es for xai.28 explanations are contrastive. when people want to know the “why” of something, “people do not ask why event p happened, but rather why event p happened instead of some event q.” explanations are selected. “humans are adept at selecting one or two causes from a sometimes infinite number of causes to be the explanation.” explanations are social. “they are a transfer of knowledge, presented as part of a conversation or interaction, and are thus presented relative to the explainer’s beliefs about the explainee’s beliefs.” finally, miller cautions against using probabilities and statistical relationships and encourages references to causes. burrell identifies three key barriers to explainability: concealment, the limited technical understanding of the user, and an incompatibility between the user (human) and algorithmic reasoning.29 while concealment is deliberate, it may or may not be justified. protecting ip and trade secrets is acceptable while obscuring processes to purposively deceive users is not. regulations are a tool to moderate the former and minimize the latter. the technical limitations of users and the incompatibility between users and algorithms suggest two remedies. first is enhancing algorithmic literacy. algorithmic literacy is a “a set of competencies that enables individuals to critically evaluate ai technologies; communicate and collaborate effectively with ai; and use ai as a tool online, at home, and in the workplace.”30 libraries have a key role in advancing algorithmic literacy in their communities.31 just as libraries championed information literacy through the promulgation of standards and principles, the provision of diverse educational programming, and the engagement of the broad academic community, so too can libraries be central to efforts to enhance algorithmic literacy. second is a requirement that xai must be sensitive to the abilities and needs of different users. a survey of the key challenges and research direction of xai identified 39 issues, including the need to understand and enhance the user experience, match xai to user expertise, and explain the competencies of ai systems to users.32 this is the essence of human-centered explainable ai (hcxai). among hcxai principles are the importance of context (regarding user objectives, decision consequences, timing, modality, and intended audience), the value of using hybrid explanation methods that complement and extend each other, and the power of contrastive examples and approaches.33 proofs and validations xai that provide proofs or validations can be adopted by libraries to assess and evaluate machine learning utilized in systems, services, and collections. since proofs pertain to already interpretable systems, the four examples provided focus on validations: feature audit, approximation and abstraction, reproducibility, and xai by ai. these techniques may require access to, or information about, the machine learning model. this would include such characteristics as the algorithms used, settings of the parameters and hyperparameters, optimization choices, and the training data. while all these may not be normally information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 5 available, designers of machine learning systems in consequential settings should expect to provide, indeed be required to provide, such access. similarly, vendors of library content or systems utilizing machine learning should make explanatory proofs and validations available for library inspection. feature audit feature audit is an explanatory strategy that attempts to reveal the key features (e.g., characteristics of the data or settings of the hyperparameters used to the differentiate data) that have a primary role in the prediction of the algorithm. by isolating these features, it is possible to explain the key components of the decision. feature audit is a standard technique of linear regression, but it is made more difficult in machine learning because of the complexity of the information space (e.g., billions of parameters and high dimensionality). there are various feature audit techniques34 but all of them are “decompositional” in that they attempt to reduce the work of the algorithm to its component parts and then use those results as an explanation.35 feature audit can highlight bias or inaccuracy by revealing incongruence between the data and the prediction. more advanced feature audit techniques (e.g., gradient feature auditing) recognize that features can indirectly influence other features and that these features are not easily detectable as separate, influential elements.36 this interaction among features challenges the strict decompositional approach to feature audit and will likely lead to an increased focus on the relational analysis among and between elements. approximation and abstraction approximation and abstraction are techniques that create a more simplified model to explain the more complex model.37 people seek and accept explanations that “satisfice”38 and are coherent with existing beliefs.39 this recognizes that “an explanation has greater power than an alternative if it makes what is being explained less surprising.”40 approaches such as “model distillation”41 or the “model agnostic” feature reduction of the local interpretable model-agnostic explanations (lime) tool create a simplified presentation of the algorithmic model.42 this approximation or abstraction may compromise accuracy, but it provides an accessible representation that enhances understandability. a different type of approximation or abstraction is a narrative of the machine learning processes utilized that provides sufficient documentation for a reader to act as an explanation of the outcomes. an exemplary case of this is lithium-ion batteries: a machine-generated summary of current research published by springer nature and written by beta writer, an ai or more accurately a suite of algorithms.43 a collaboration of machine learning and human editors, the full production cycle of the book is documented in the introduction.44 in lieu of being able to interrogate the system directly, this detailed account provides an explanation of the system allowing readers to assess the strengths, limitations, and confidence levels of the algorithmic processes and offers a model of what might be necessary for future ai generated texts.45 libraries can utilize this documentation in acquisition or licensing decisions and subsequently make it available as user guides when resources are added to the collection. reproducibility replication is a verification strategy fundamental to science. being able to independently reproduce results in different settings provides evidence of veracity and supports user trust. however, documented problems in reproducing machine learning studies have questioned the information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 6 generalizability of these approaches and undermined their explanatory capacity. for example, an analysis of text mining studies using machine learning for citation screening in the preparation of systemic reviews revealed a lack of key elements to enable replicability (e.g., access to research datasets, software environments used, randomization control, and lack of detail on new methods proposed or employed).46 in response, a “reproducibility challenge” was created by the international conference on learning representations (iclr) to validate 2018 conference submissions and has continued in subsequent meetings.47 more rigorous replication through the availability of all necessary components and the development of standards will be important to this type of verification.48 xai by ai the inherent complexity and opacity of unsupervised learning or reinforcement learning suggests, as xai researcher trevor darrell puts it, “the solution to explainable ai is more ai.”49 in this approach to explanation, oversight ai are positioned as intermediaries between an ai and its users: workers have supervisors; businesses have accountants; schoolteachers have principals. we suggest that the time has come to develop ai oversight systems (“ai guardians”) that will seek to ensure that the various smart machines will not stray from the guidelines their programmers have provided.50 while the prospect of ai guardians may be dystopic, oversight systems performing roles that validate, interrogate, and report are common in code checking tools. generative adversarial networks (gans) have been used to create counterfactual explanations of another machine learning model to enhance explainability.51 with strategic organizational and staffing changes to enhance capabilities, libraries can design and deploy such oversight or adversarial tools with objectives appropriate to the requirements and norms of libraries and the academy. authorization xai that results from authorizations is an area where public policy engagement is needed to ensure xai, and machine learning, are appropriately serving libraries, the academy, and the public at large. three examples are provided: codes and standards, regulation, and audit. codes and standards one approach to explanation, supported by the ai industry and professional organizations, are voluntary codes or standards that encourage explanatory capabilities. these nonbinding principles are a type of self-regulation and are widely promoted as a means of assurance.52 the association for computing machinery’s statement on algorithms highlights seven principles as guides to system design and use: awareness, access and redress, accountability, explanation, data provenance, auditability, validation, and testing. however, the language used is tentative and conditional. designers are “encouraged” to provide explanations and to “encourage” a means for interrogation and auditing “where harm is suspected” (i.e., a post hoc process). despite this, the statement concludes with a strong position on accountability if not explainability: “institutions should be held responsible for decisions made by the algorithms that they use, even if it is not feasible to explain in detail how the algorithms produce their results.”53 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 7 unfortunately, the optimism for self-regulation in explainability is undercut by the poor experience with voluntary mechanisms regarding privacy protection.54 in addition, library associations, library system vendors, and scholarly publishers have been slow to endorse any codes or standards regarding explainability. regulation the most common recommendation for ai oversight and authorization to ensure explainability is the creation of a regulatory agency. specific suggestions include a “neutral data arbiter” with investigative powers like the us federal trade commission,55 a food and drug administration “for algorithms,”56 a standing “commission on artificial intelligence,”57 quasi-governmental agencies such as the council of europe,58 and a hybrid agency model combining certification and liability.59 such agencies would have legislated or delegated powers to investigate, certify, license, and arbitrate on matters relating to ai and algorithms, including their design, use, and effects. there are few calls for an international regulatory agency despite digitally porous national boundaries and the global reach of machine learning.60 that almost no such agencies have been created reveals the strength and influence of the large corporations responsible for developing and deploying most machine learning tools and systems.61 reports comparing regulatory approaches to ai among the european union, the united kingdom, the united states, and canada indicate significantly different approaches but with most proceeding with a “light touch” to avoid competitive disadvantages in a multitrillion dollar global marketplace.62 the introduction of the draft eu artificial intelligence act marks the first major jurisdiction to propose specific ai legislation.63 while the act is fulsome about high-risk ai, it is silent on any notion of “explainable” ai, preferring to focus on the less specific idea of “trustworthy artificial intelligence.” with this the eu appears to retreat from the idea of explainability in the gdpr. an exception to this inertia or backtracking is the development and use of algorithmic impact assessments in both governments and industry. these instruments help prospective users of an algorithmic decision-making system determine levels of explanatory requirements and standards to meet those requirements.64 canada has been a leader in this area with a protocol covering use of these systems in the federal government.65 some identify due process as a possible, if limited, remedy for explainability.66 however, a landmark us case suggests otherwise. in state v. loomis, regarding the use of compas, an algorithmic sentencing system, the court ruled on the role of explanation in due process:67 the wisconsin supreme court held that a trial court’s use of an algorithmic risk assessment in sentencing did not violate the defendant’s due process rights even though the methodology used to produce the assessment was disclosed neither to the court nor to the defendant.68 the petition of the loomis case to the us supreme court was denied, so a higher court ruling on this issue is unavailable.69 advocacy for regulations regarding explainability should be a central concern for libraries. without strong regulatory oversight requiring disclosure and accountability, machine learning information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 8 systems will remain black boxes and presence of these consequential systems in the lives of users will be obscured. audit a commonly recommended approach to ai oversight and explanation is third-party auditing.70 the use of audit and principles of auditing are widely accepted in a variety of areas. 71 in a library context, auditing of ai can be thought of as a reviewing process to achieve transparency or to determine product compliance. auditing is typically done after system implementation, but it can be accomplished at any stage. it is possible to audit design specifications, completed code, cognitive models, or periodic audits of specific decisions.72 the keys to successful audit oversight are clear audit goals and objectives (e.g., what is being audited and for what purpose), acknowledged expertise of the auditors, authority of the auditors to recommend, and authorization of the auditors to investigate. any such auditing responsibility for xai would require the trust of stakeholders such as ai designers, government regulators, industry representatives as well as users themselves. critics of the audit approach have focused on lack of auditor expertise, algorithmic complexity, and the need for approaches that assess the algorithmic system prior to its release. 73 while most audit recommendations assume a public agency in this role, an innovative suggestion is a crowdsourced audit (a form of audit study that involves the recruitment of testers to anonymously assess an algorithmic system; an xai form of the “secret shopper”).74 this approach resembles techniques used by consumer advocates and might indicate the rise of public activists into the xai arena. the complexity of algorithms suggests that a precondition for an audit is “auditability.”75 this would require that ai be designed in such a way that an audit is possible (i.e., inspectable in some manner) while, presumably, not impairing its predictive performance. sandvig et al. propose regulatory changes because “rather than regulating for transparency or misbehavior, we find this situation argues for ‘regulation toward auditability’.”76 auditing is not without its difficulties. there are no industry standards for algorithmic auditing.77 a high-profile development was the recent launch of orcaa (orcaarisk.com), an algorithmic auditing company started by cathy o’neil, a data scientist who has written extensively about the perils of uncontrolled algorithms.78 however, the legitimacy of third-party auditing has been criticized as lacking public transparency and the capacity to demand change.79 while libraries may not be able to create their own auditing capacity, whether collectively or individually, they are encouraged to engage with the emerging algorithmic auditing community to shape auditing practices appropriate for scholarly communication. xai as discovery while xai is primarily a means to validate and authorize machine learning systems, another use of xai is gaining attention. since xai can find new information latent in large and complex datasets, discovery is promoted as “one of the most important achievements of the entire algorithmic explainability project.”80 alkhateeb asks “can scientific discovery really be automated” while invoking the earlier work of swanson which mined the medical literature for new knowledge by connecting seemingly unrelated articles through search.81 an emerging reason for libraries to adopt xai may be as a powerful discovery tool. https://orcaarisk.com/ information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 9 conclusion our lives have become “algorithmically mediated”82 where we are “dependent on computational spectacles to see the world.”83 academic libraries are now sites where systems, services, and collections are increasingly shaped and provided by machine learning. the predictions, recommendations, and decisions of machine learning systems are powerful as well as consequential. however, “the danger is not so much in delegating cognitive tasks, but in distancing ourselves from—or in not knowing about—the nature and precise mechanisms of that delegation.”84 taddeo notes that “delegation without supervision characterises the presence of trust.”85 xai is an essential tool to build that trust. geoffrey hinton, a central figure in the development of machine learning,86 argues that requiring an explanation from an ai system would be “a complete disaster” and that trust and acceptance should be based on the system’s performance, not its explainability.87 this is consistent with the view of many that “if algorithms that cannot be easily explained consistently make better decisions in certain areas, then policymakers should not require an explanation.”88 both these views are at odds with the tenants of critical thought and assessment, and both challenge norms of algorithmic accountability. xai is a dual opportunity for libraries. on one hand, it is a set of techniques, processes, and strategies that enable the interrogation of the algorithmically driven resources that libraries provide to their users. on the other hand, it is a public policy arena where advocacy is necessary to promote and uphold the values of librarianship, the academy, and the public interest in the face of powerful new technologies. many disciplines have engaged with xai as machine learning has impacted their fields.89 xai has been called a “disruptive force” in lis,90 warranting the growing interest in how xai affects the field and how the field might influence it. information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 10 endnotes 1 vijay arya et al., “one explanation does not fit all: a toolkit and taxonomy of ai explainability techniques,” arxiv:1909.03012 [cs, stat], 2019, http://arxiv.org/abs/1909.03012; shane t. mueller et al., “explanation in human-ai systems: a literature meta-review, synopsis of key ideas and publications, and bibliography for explainable ai,” arxiv:1902.01876 [cs], 2019, http://arxiv.org/abs/1902.01876; ingrid nunes and dietmar jannach, “a systematic review and taxonomy of explanations in decision support and recommender systems,” user modeling and user-adapted interaction 27, no. 3 (2017): 393–444, https://doi.org/10.1007/s11257-017-9195-0; gesina schwalbe and bettina finzel, “xai method properties: a (meta-) study,” arxiv:2105.07190 [cs], 2021, http://arxiv.org/abs/2105.07190. 2 safiya noble, algorithms of oppression: how search engines reinforce racism (new york: new york university press, 2018); frank pasquale, the black box society: the secret algorithms that control money and information (cambridge, mass.: harvard university press, 2015); sara wachter-boettcher, technically wrong: sexist apps, biased algorithms, and other threats of toxic tech (new york: w. w. norton, 2017). 3 abeba birhane et al., “the values encoded in machine learning research,” arxiv:2106.15590 [cs], 2021, http://arxiv.org/abs/2106.15590; taina bucher, if ... then: algorithmic power and politics (new york: oxford university press, 2018); sarah myers west, meredith whittaker, and kate crawford, discriminating systems: gender, race, and power in ai (ai now institute, 2019), https://ainowinstitute.org/discriminatingsystems.html. 4 rao aluri and donald e. riggs, “application of expert systems to libraries,” ed. joe a. hewitt, advances in library automation and networking 2 (1988): 1–43; ryan cordell, machine learning + libraries: a report on the state of the field (washington dc: library of congress, 2020), https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf; jason griffey, ed., “artificial intelligence and machine learning in libraries,” library technology reports 55, no. 1 (2019), https://doi.org/10.5860/ltr.55n1; guoying liu, “the application of intelligent agents in libraries: a survey,” program: electronic library and information systems 45, no. 1 (2011): 78–97, https://doi.org/10.1108/00330331111107411; linda c. smith, “artificial intelligence in information retrieval systems,” information processing and management 12, no. 3 (1976): 189–222, https://doi.org/10.1016/0306-4573(76)90005-4. 5 jenny bunn, “working in contexts for which transparency is important: a recordkeeping view of explainable artificial intelligence (xai),” records management journal (london, england) 30, no. 2 (2020): 143–53, https://doi.org/10.1108/rmj-08-2019-0038; cordell, “machine learning + libraries”; andrew m. cox, the impact of ai, machine learning, automation and robotics on the information professions (cilip, 2021), http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report__final_lo.pdf; daniel johnson, machine learning, libraries, and cross-disciplinary research: possibilities and provocations (notre dame, indiana: hesburgh libraries, university of notre dame, 2020), https://dx.doi.org/10.7274/r0-wxg0-pe06; sarah lippincott, mapping the current landscape of research library engagement with emerging technologies in research and learning (washington dc: association of research libraries, 2020), https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies http://arxiv.org/abs/1909.03012 http://arxiv.org/abs/1902.01876 https://doi.org/10.1007/s11257-017-9195-0 http://arxiv.org/abs/2105.07190 http://arxiv.org/abs/2106.15590 https://ainowinstitute.org/discriminatingsystems.html https://labs.loc.gov/static/labs/work/reports/cordell-loc-ml-report.pdf https://doi.org/10.5860/ltr.55n1 https://doi.org/10.1108/00330331111107411 https://doi.org/10.1016/0306-4573(76)90005-4 https://doi.org/10.1108/rmj-08-2019-0038 http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report_-_final_lo.pdf http://www.cilip.org.uk/resource/resmgr/cilip/research/tech_review/cilip_–_ai_report_-_final_lo.pdf https://dx.doi.org/10.7274/r0-wxg0-pe06 https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies-landscape-summary.pdf information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 11 landscape-summary.pdf; thomas padilla, responsible operations. data science, machine learning, and ai in libraries (dublin, oh: oclc research, 2019), https://doi.org/10.25333/xk7z-9g97; michael ridley, “explainable artificial intelligence,” research library issues, no. 299 (2019): 28–46, https://doi.org/10.29242/rli.299.3. 6 ridley, “explainable artificial intelligence,” 42. 7 bunn, “working in contexts for which transparency is important,” 151. 8 sebastian palacio et al., “xai handbook: towards a unified framework for explainable ai,” arxiv:2105.06677 [cs], 2021, http://arxiv.org/abs/2105.06677; sahil verma et al., “pitfalls of explainable ml: an industry perspective,” in mlsys journe workshop, 2021, http://arxiv.org/abs/2106.07758; giulia vilone and luca longo, “explainable artificial intelligence: a systematic review,” arxiv:2006.00093 [cs], 2020, http://arxiv.org/abs/2006.00093. 9 wojciech samek and klaus-robert muller, “towards explainable artificial intelligence,” in explainable ai: interpreting, explaining and visualizing deep learning, ed. wojciech samek et al., lecture notes in artificial intelligence 11700 (cham: springer international publishing, 2019), 17. 10 mueller et al., “explanation in human-ai systems.” 11 isto huvila et al., “information behavior and practices research informing information systems design,” journal of the association for information science and technology, 2021, 1–15, https://doi.org/10.1002/asi.24611. 12 darpa, explainable artificial intelligence (xai) (arlington, va: darpa, 2016), http://www.darpa.mil/attachments/darpa-baa-16-53.pdf. 13 matt turek, “explainable artificial intelligence (xai),” darpa, https://www.darpa.mil/program/explainable-artificial-intelligence. 14 julie gerlings, arisa shollo, and ioanna constantiou, “reviewing the need for explainable artificial intelligence (xai),” in proceedings of the hawaii international conference on system sciences, 2020, http://arxiv.org/abs/2012.01007. 15 william j. clancey, “the epistemology of a rule-based expert system—a framework for explanation,” artificial intelligence 20, no. 3 (1983): 215–51, https://doi.org/10.1016/00043702(83)90008-5; william swartout, “xplain: a system for creating and explaining expert consulting programs,” artificial intelligence 21 (1983): 285–325; william swartout, cecile paris, and johanna moore, “design for explainable expert systems,” ieee expert-intelligent systems & their applications 6, no. 3 (1991): 58–64, https://doi.org/10.1109/64.87686. 16 european union, “regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016,” 2016, http://eur-lex.europa.eu/legalcontent/en/txt/?uri=celex:32016r0679. https://www.arl.org/wp-content/uploads/2020/03/2020.03.25-emerging-technologies-landscape-summary.pdf https://doi.org/10.25333/xk7z-9g97 https://doi.org/10.29242/rli.299.3 http://arxiv.org/abs/2105.06677 http://arxiv.org/abs/2006.00093 https://doi.org/10.1002/asi.24611 http://www.darpa.mil/attachments/darpa-baa-16-53.pdf https://www.darpa.mil/program/explainable-artificial-intelligence http://arxiv.org/abs/2012.01007 https://doi.org/10.1016/0004-3702(83)90008-5 https://doi.org/10.1016/0004-3702(83)90008-5 https://doi.org/10.1109/64.87686 http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 http://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:32016r0679 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 12 17 lilian edwards and michael veale, “slave to the algorithm? why a ‘right to explanation’ is probably not the remedy you are looking for,” duke law & technology review 16 (2017): 18–84; bryce goodman and seth flaxman, “european union regulations on algorithmic decision making and a ‘right to explanation’,” ai magazine 38, no. 3 (2017): 50–57, https://doi.org/10.1609/aimag.v38i3.2741; margot e. kaminski, “the right to explanation, explained,” berkeley technology law journal 34, no. 1 (2019): 189–218, https://doi.org/10.15779/z38td9n83h; sandra wachter, brent mittelstadt, and luciano floridi, “why a right to explanation of automated decision-making does not exist in the general data protection regulation,” international data privacy law 7, no. 2 (2017): 76–99, https://doi.org/10.1093/idpl/ipx005. 18 amina adadi and mohammed berrada, “peeking inside the black-box: a survey on explainable artificial intelligence (xai),” ieee access 6 (2018): 52138–60, https://doi.org/10.1109/access.2018.2870052; mueller et al., “explanation in human-ai systems”; vilone and longo, “explainable artificial intelligence.” 19 schwalbe and finzel, “xai method properties.” 20 or biran and courtenay cotton, “explanation and justification in machine learning: a survey” (international joint conference on artificial intelligence, workshop on explainable artificial intelligence (xai), melbourne, 2017), http://www.cs.columbia.edu/~orb/papers/xai_survey_paper_2017.pdf. 21 padilla, responsible operations. 22 jenna burrell and marion fourcade, “the society of algorithms,” annual review of sociology 47, no. 1 (2021): 231, https://doi.org/10.1146/annurev-soc-090820-020800. 23 nick seaver, “seeing like an infrastructure: avidity and difference in algorithmic recommendation,” cultural studies 35, no. 4–5 (2021): 775, https://doi.org/10.1080/09502386.2021.1895248. 24 frank c. keil, “explanation and understanding,” annual review of psychology 57 (2006): 227– 54, https://doi.org/10.1146/annurev.psych.57.102904.190100. 25 donald a. norman, “some observations on mental models,” in mental models, ed. dedre gentner and albert l. stevens (new york: psychology press, 1983), 7–14. 26 ashraf abdul et al., “trends and trajectories for explainable, accountable, and intelligible systems: an hci research agenda,” in proceedings of the 2018 chi conference on human factors in computing systems, chi ’18 (new york: acm, 2018), 582:1–582:18, https://doi.org/10.1145/3173574.3174156; joachim diederich, “methods for the explanation of machine learning processes and results for non-experts,” psyarxiv, 2018, https://doi.org/10.31234/osf.io/54eub. 27 pigi kouki et al., “user preferences for hybrid explanations,” in proceedings of the eleventh acm conference on recommender systems, recsys ’17 (new york, ny: acm, 2017), 84–88, https://doi.org/10.1145/3109859.3109915. https://doi.org/10.1609/aimag.v38i3.2741 https://doi.org/10.15779/z38td9n83h https://doi.org/10.1093/idpl/ipx005 https://doi.org/10.1109/access.2018.2870052 http://www.cs.columbia.edu/~orb/papers/xai_survey_paper_2017.pdf https://doi.org/10.1146/annurev-soc-090820-020800 https://doi.org/10.1080/09502386.2021.1895248 https://doi.org/10.1146/annurev.psych.57.102904.190100 https://doi.org/10.1145/3173574.3174156 https://doi.org/10.31234/osf.io/54eub https://doi.org/10.1145/3109859.3109915 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 13 28 tim miller, “explanation in artificial intelligence: insights from the social sciences,” artificial intelligence 267 (2019): 3, https://doi.org/10.1016/j.artint.2018.07.007. 29 jenna burrell, “how the machine ‘thinks’: understanding opacity in machine learning algorithms,” big data & society 3, no. 1 (2016), https://doi.org/10.1177/2053951715622512. 30 duri long and brian magerko, “what is ai literacy? competencies and design considerations,” in proceedings of the 2020 chi conference on human factors in computing systems, chi ’20 (honolulu, hi: association for computing machinery, 2020), 2, https://doi.org/10.1145/3313831.3376727. 31 michael ridley and danica pawlick-potts, “algorithmic literacy and the role for libraries,” information technology and libraries 40, no. 2 (2021), https://doi.org/doi.org/10.6017/ital.v40i2.12963. 32 waddah saeed and christian omlin, “explainable ai (xai): a systematic meta-survey of current challenges and future opportunities,” arxiv:2111.06420 [cs], 2021, http://arxiv.org/abs/2111.06420. 33 shane t. mueller et al., “principles of explanation in human-ai systems” (explainable agency in artificial intelligence workshop, aaai 2021), http://arxiv.org/abs/2102.04972. 34 sebastian bach et al., “on pixel-wise explanations for non-linear classifier decisions by layerwise relevance propagation,” plos one 10, no. 7 (2015): e0130140, https://doi.org/10.1371/journal.pone.0130140; biran and cotton, “explanation and justification in machine learning: a survey”; chris brinton, “a framework for explanation of machine learning decisions” (ijcai-17 workshop on explainable ai (xai), melbourne: ijcai, 2017), http://www.intelligentrobots.org/files/ijcai2017/ijcai-17_xai_ws_proceedings.pdf; chris olah, alexander mordvintsev, and ludwig schubert, “feature visualization,” distill, november 7, 2017, https://doi.org/10.23915/distill.00007. 35 edwards and veale, “slave to the algorithm?” 36 philip adler et al., “auditing black-box models for indirect influence,” knowledge and information systems 54 (2018): 95–122, https://doi.org/10.1007/s10115-017-1116-3. 37 alisa bokulich, “how scientific models can explain,” synthese 180, no. 1 (2011): 33–45, https://doi.org/10.1007/s11229-009-9565-1; keil, “explanation and understanding.” 38 herbert a. simon, “what is an ‘explanation’ of behavior?,” psychological science 3, no. 3 (1992): 150–61, https://doi.org/10.1111/j.1467-9280.1992.tb00017.x. 39 norbert schwarz et al., “ease of retrieval as information: another look at the availability heuristic,” journal of personality and social psychology 61, no. 2 (1991): 195–202, https://doi.org/10.1037/0022-3514.61.2.195; paul thagard, “evaluating explanations in law, science, and everyday life,” current directions in psychological science 15, no. 3 (2006): 141– 45, https://doi.org/10.1111/j.0963-7214.2006.00424.x. https://doi.org/10.1016/j.artint.2018.07.007 https://doi.org/10.1177/2053951715622512 https://doi.org/10.1145/3313831.3376727 https://doi.org/doi.org/10.6017/ital.v40i2.12963 http://arxiv.org/abs/2111.06420 http://arxiv.org/abs/2102.04972 https://doi.org/10.1371/journal.pone.0130140 http://www.intelligentrobots.org/files/ijcai2017/ijcai-17_xai_ws_proceedings.pdf https://doi.org/10.23915/distill.00007 https://doi.org/10.1007/s10115-017-1116-3 https://doi.org/10.1007/s11229-009-9565-1 https://doi.org/10.1111/j.1467-9280.1992.tb00017.x https://doi.org/10.1037/0022-3514.61.2.195 https://doi.org/10.1111/j.0963-7214.2006.00424.x information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 14 40 tania lombrozo, “explanatory preferences shape learning and inference,” trends in cognitive sciences 20, no. 10 (2016): 756, https://doi.org/10.1016/j.tics.2016.08.001. 41 sarah tan et al., “detecting bias in black-box models using transparent model distillation,” arxiv:1710.06169 [cs, stat], november 18, 2017, http://arxiv.org/abs/1710.06169. 42 marco tulio ribeiro, sameer singh, and carlos guestrin, “model-agnostic interpretability of machine learning,” arxiv:1606.05386 [cs, stat], 2016, http://arxiv.org/abs/1606.05386. 43 beta writer, lithium-ion batteries: a machine-generated summary of current research (heidelberg: springer nature, 2019), https://link.springer.com/book/10.1007/978-3-03016800-1. 44 henning schoenenberger, christian chiarcos, and niko schenk, preface to lithium-ion batteries; a machine-generated summary of current research, by beta writer, (heidelberg: springer international publishing, 2019). 45 michael ridley, “machine information behaviour,” in the rise of ai: implications and applications of artificial intelligence in academic libraries, ed. sandy hervieux and amanda wheatley (association of college and university libraries, 2022). 46 babatunde kazeem olorisade, pearl brereton, and peter andras, “reproducibility of studies on text mining for citation screening in systematic reviews: evaluation and checklist,” journal of biomedical informatics 73 (2017): 1–13, https://doi.org/10.1016/j.jbi.2017.07.010; babatunde k. olorisade, pearl brereton, and peter andras, “reproducibility in machine learning-based studies: an example of text mining,” in reproducibility in ml workshop (international conference on machine learning, sydney, australia, 2017), https://openreview.net/pdf?id=by4l2pbq-. 47 joelle pineau, “reproducibility challenge,” october 6, 2017, http://www.cs.mcgill.ca/~jpineau/iclr2018-reproducibilitychallenge.html. 48 benjamin haibe-kains et al., “transparency and reproducibility in artificial intelligence,” nature 586, no. 7829 (2020): e14–e16, https://doi.org/10.1038/s41586-020-2766-y; benjamin j. heil et al., “reproducibility standards for machine learning in the life sciences,” nature methods, august 30, 2021, https://doi.org/10.1038/s41592-021-01256-7. 49 cliff kuang, “can a.i. be taught to explain itself?,” the new york times magazine, november 21, 2017, 50, https://nyti.ms/2hr1s15. 50 amitai etzioni and oren etzioni, “incorporating ethics into artificial intelligence,” the journal of ethics 21, no. 4 (2017): 403–18, https://doi.org/10.1007/s10892-017-9252-2. 51 kamran alipour et al., “improving users’ mental model with attention-directed counterfactual edits,” applied ai letters, 2021, e47, https://doi.org/10.1002/ail2.47. 52 association for computing machinery, statement on algorithmic transparency and accountability (new york: acm, 2017), http://www.acm.org/binaries/content/assets/publicpolicy/2017_joint_statement_algorithms.pdf; alex campolo et al., ai now 2017 report (new https://doi.org/10.1016/j.tics.2016.08.001 http://arxiv.org/abs/1710.06169 http://arxiv.org/abs/1606.05386 https://link.springer.com/book/10.1007/978-3-030-16800-1 https://link.springer.com/book/10.1007/978-3-030-16800-1 https://doi.org/10.1016/j.jbi.2017.07.010 https://openreview.net/pdf?id=by4l2pbqhttp://www.cs.mcgill.ca/~jpineau/iclr2018-reproducibilitychallenge.html https://doi.org/10.1038/s41586-020-2766-y https://doi.org/10.1038/s41592-021-01256-7 https://nyti.ms/2hr1s15 https://doi.org/10.1007/s10892-017-9252-2 https://doi.org/10.1002/ail2.47 http://www.acm.org/binaries/content/assets/public-policy/2017_joint_statement_algorithms.pdf http://www.acm.org/binaries/content/assets/public-policy/2017_joint_statement_algorithms.pdf information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 15 york: ai now institute, 2017); ieee, ethically aligned design: a vision for prioritizing human wellbeing with artificial intelligence and autonomous systems (new york: ieee, 2019), https://standards.ieee.org/content/dam/ieeestandards/standards/web/documents/other/ead1e.pdf. 53 association for computing machinery, statement on algorithmic transparency and accountability, 2. 54 lilian edwards and michael veale, “enslaving the algorithm: from a ‘right to an explanation’ to a ‘right to better decisions’?,” ieee security & privacy 16, no. 3 (2018): 46–54. 55 kate crawford and jason schultz, “big data and due process: toward a framework to redress predictive privacy harms,” boston college law review 55, no. 1 (2014): 93–128. 56 andrew tutt, “an fda for algorithms,” administrative law review 69, no. 1 (2017): 83–123. 57 corinne cath et al., “artificial intelligence and the ‘good society’: the us, eu, and uk approach,” science and engineering ethics, march 28, 2017, https://doi.org/10.1007/s11948-017-9901-7. 58 edwards and veale, “slave to the algorithm?” 59 matthew u. scherer, “regulating artificial intelligence systems: risks, challenges, competencies, and strategies,” harvard journal of law & technology 29, no. 2 (2016): 353– 400. 60 roger brownsword, “from erewhon to alphago: for the sake of human dignity, should we destroy the machines?,” law, innovation and technology 9, no. 1 (january 2, 2017): 117–53, https://doi.org/10.1080/17579961.2017.1303927. 61 birhane et al., “the values encoded in machine learning research”; ana brandusescu, artificial intelligence policy and funding in canada: public investments, private interests (montreal: centre for interdisciplinary research on montreal, mcgill university, 2021). 62 cath et al., “artificial intelligence and the ‘good society’”; law commission of ontario and céline castets-renard, comparing european and canadian ai regulation, 2021, https://www.lcocdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulationfinal-november-2021.pdf. 63 european commission, “artificial intelligence act,” 2021, https://eur-lex.europa.eu/legalcontent/en/txt/?uri=celex:52021pc0206. 64 dillon reisman et al., algorithmic impact assessment: a practical framework for public agency accountability (new york: ai now institute, 2018), https://ainowinstitute.org/aiareport2018.pdf. 65 treasury board of canada secretariat, “directive on automated decision-making,” 2019, http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592. https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead1e.pdf https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/other/ead1e.pdf https://doi.org/10.1007/s11948-017-9901-7 https://doi.org/10.1080/17579961.2017.1303927 https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://www.lco-cdo.org/wp-content/uploads/2021/12/comparing-european-and-canadian-ai-regulation-final-november-2021.pdf https://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:52021pc0206 https://eur-lex.europa.eu/legal-content/en/txt/?uri=celex:52021pc0206 https://ainowinstitute.org/aiareport2018.pdf http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 16 66 danielle keats citron and frank pasquale, “the scored society: due process for automated predictions,” washington law review 89 (2014): 1–33; scherer, “regulating artificial intelligence systems.” 67 julia angwin et al., “machine bias,” propublica, may 23, 2016, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. 68 “state v. loomis,” harvard law review 130, no. 5 (2017), https://harvardlawreview.org/2017/03/state-v-loomis/. 69 “loomis v. wisconsin,” scotusblog, june 26, 2017, http://www.scotusblog.com/casefiles/cases/loomis-v-wisconsin/. 70 brownsword, “from erewhon to alphago”; campolo et al., ai now 2017 report; ieee, ethically aligned design; pasquale, the black box society: the secret algorithms that control money and information; wachter, mittelstadt, and floridi, “why a right to explanation.” 71 michael power, the audit society: rituals of verification (oxford: oxford university press, 1997). 72 alfred ng, “can auditing eliminate bias from algorithms?,” the markup, february 23, 2021, https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-fromalgorithms. 73 joshua alexander knoll, “accountable algorithms” (phd diss, princeton university, 2015). 74 christian sandvig et al., “auditing algorithms: research methods for detecting discrimination on internet platforms,” data and discrimination: converting critical concerns into productive inquiry, 2014, http://wwwpersonal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20-%20ica%202014%20data%20and%20discrimination%20preconference.pdf. 75 association for computing machinery, statement on algorithmic transparency and accountability. 76 sandvig et al., “auditing algorithms,” 17. 77 ng, “can auditing eliminate bias from algorithms?” 78 cathy o’neil, weapons of math destruction: how big data increases inequality and threatens democracy (new york: crown, 2016). 79 emanuel moss et al., assembling accountability: algorithmic impact assessment for the public interest (data & society, 2021), https://datasociety.net/wpcontent/uploads/2021/06/assembling-accountability.pdf. 80 david s. watson and luciano floridi, “the explanation game: a formal framework for interpretable machine learning,” synthese (dordrecht) 198, no. 10 (2020): 9214, https://doi.org/10.1007/s11229-020-02629-9. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing https://harvardlawreview.org/2017/03/state-v-loomis/ http://www.scotusblog.com/case-files/cases/loomis-v-wisconsin/ http://www.scotusblog.com/case-files/cases/loomis-v-wisconsin/ https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-from-algorithms https://themarkup.org/ask-the-markup/2021/02/23/can-auditing-eliminate-bias-from-algorithms http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf http://www-personal.umich.edu/~csandvig/research/auditing%20algorithms%20--%20sandvig%20--%20ica%202014%20data%20and%20discrimination%20preconference.pdf https://datasociety.net/wp-content/uploads/2021/06/assembling-accountability.pdf https://datasociety.net/wp-content/uploads/2021/06/assembling-accountability.pdf https://doi.org/10.1007/s11229-020-02629-9 information technology and libraries june 2022 explainable artificial intelligence (xai) | ridley 17 81 ahmed alkhateeb, “science has outgrown the human mind and its limited capacities,” aeon, april 24, 2017, https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limitedcapacities; don r. swanson, “undiscovered public knowledge,” the library quarterly 56, no. 2 (1986): 103–18; don r. swanson, “medical literature as a potential source of new knowledge.,” bulletin of the medical library association 78, no. 1 (1990): 29–37. 82 jack anderson, “understanding and interpreting algorithms: toward a hermeneutics of algorithms,” media, culture & society 42, no. 7–8 (2020): 1479–94, https://doi.org/10.1177/0163443720919373. 83 ed finn, “algorithm of the enlightenment,” issues in science and technology 33, no. 3 (2017): 24. 84 jos de mul and bibi van den berg, “remote control: human autonomy in the age of computermediated agency,” in law, human agency, and autonomic computing, ed. mireille hildebrandt and antoinette rouvroy (abingdon: routledge, 2011), 59. 85 mariarosaria taddeo, “trusting digital technologies correctly,” minds and machines 27, no. 4 (2017): 565, https://doi.org/10.1007/s11023-017-9450-5. 86 cade metz, genius makers: the mavericks who brought ai to google, facebook, and the world (dutton, 2021). 87 tom simonite, “google’s ai guru wants computers to think more like brains,” wired, december 12, 2018, https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/. 88 nick wallace, “eu’s right to explanation: a harmful restriction on artificial intelligence,” techzone, january 25, 2017, http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-rightexplanation-harmful-restriction-artificial-intelligence.htm#. 89 mueller et al., “explanation in human-ai systems.” 90 bunn, “working in contexts for which transparency is important,” 143. https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limited-capacities https://aeon.co/ideas/science-has-outgrown-the-human-mind-and-its-limited-capacities https://doi.org/10.1177/0163443720919373 https://doi.org/10.1007/s11023-017-9450-5 https://www.wired.com/story/googles-ai-guru-computers-think-more-like-brains/ http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-right-explanation-harmful-restriction-artificial-intelligence.htm http://www.techzone360.com/topics/techzone/articles/2017/01/25/429101-eus-right-explanation-harmful-restriction-artificial-intelligence.htm abstract introduction what is xai? types of xai prerequisites to an xai strategy proofs and validations feature audit approximation and abstraction reproducibility xai by ai authorization codes and standards regulation audit xai as discovery conclusion endnotes