ISSN: 2474-3542 Journal homepage: http://journal.calaijol.org Linked Data Technologies and What Libraries Have Accomplished So Far Yongming Wang and Sharon Q. Yang Abstract: For the past ten years libraries have been working diligently towards Linked Data and the Semantic Web. Due to the complexity and vast scope of Linked Data, many people have a hard time to understand its technical details and its potential for the library community. This paper aims to help librarians better understand some important concepts by explaining the basic Linked Data technologies that consist of Resource Description Framework (RDF), the ontology, and the query language. It also includes an overview of the achievements by libraries around the world in their efforts to turn library data into Linked Data including those by Library of Congress, OCLC, and some other national libraries. Some of the challenges and setbacks that libraries have encountered are analyzed and discussed. In spite of the difficulties, there is no way to turn back. Libraries will have to succeed. To cite this article: Wang, Y., & Yang, S. Q. (2018). Linked Data Technologies and What Libraries Have Accomplished So Far. International Journal of Librarianship, 3(1), 3-20. doi: https://doi.org/10.23974/ijol.2018.vol3.1.62 To submit your article to this journal: Go to http://ojs.calaijol.org/index.php/ijol/about/submissions ht:/j.aajlogidxppio/bu/umsin������������������������������ INTERNATIONAL JOURNAL OF LIBRARIANSHIP, 3(1), 3-20 ISSN:2474-3542 Linked Data Technologies and What Libraries Have Accomplished So Far Yongming Wang, The College of New Jersey Sharon Q. Yang, Rider University ABSTRACT For the past ten years libraries have been working diligently towards Linked Data and the Semantic Web. Due to the complexity and vast scope of Linked Data, many people have a hard time to understand its technical details and its potential for the library community. This paper aims to help librarians better understand some important concepts by explaining the basic Linked Data technologies that consist of Resource Description Framework (RDF), the ontology, and the query language. It also includes an overview of the achievements by libraries around the world in their efforts to turn library data into Linked Data including those by Library of Congress, OCLC, and some other national libraries. Some of the challenges and setbacks that libraries have encountered are analyzed and discussed. In spite of the difficulties, there is no way to turn back. Libraries will have to succeed. Keywords: Linked Data, Semantic Web, Resource Description Framework, BIBFRAME, Library of Congress, OCLC INTRODUCTION What is Linked Data? According to David Wood, the co-chair of the W3C’s (World Wide Web Consortium) RDF Working Group which lays the foundation for Linked Data and the Semantic Web, “Linked Data is a set of techniques to represent and connect structured data on the web… Linked Data makes the World Wide Web into a global database that we call the Web of Data” (Wood, Zaidman, Ruth, & Hausenblas, 2014). Linked Data technologies, with its broader concept, Semantic Web, has gained rapid momentum and popularity on the World Wide Web. The Linked Data technologies hold the potential to evolve the current Web of document into the Web of Data. Imagine that in the future Internet world, not only web documents but all data are 4Wang & Yang / International Journal of Librarianship, 3(1) connected. More importantly, these connected data are not only accessible to human but to machine also. In other words, all devices that are connected on the Internet can access and process those linked data and thereby make smart decisions automatically. This will greatly enhance the way we access information and make informed decisions. The ideas are not new. As early as late 90’s, Tim Berners-Lee (2000), the inventor of World Wide Web, had a vision for Semantic Web: “I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which makes this possible, has yet to emerge, but when it does, the day- to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.” In 2004, W3C published the first recommendation of the data model for Linked Data, the RDF 1.0. In 2005, W3C formed the Semantic Web Interest Group. And in 2006, Tim Berners- Lee published the Linked Data principles and design rules, which paves the way for large scale adoption and development of Linked Data technologies (Berners-Lee, 2006). The last ten plus years has witnessed rapid adoption and usage of Linked Data by companies small and large. Companies such as Google and Facebook use Linked Data to enhance their searching capability and connections (Wood et al., 2014). Retail company BestBuy uses Linked Data to improve its business bottom lines (Wood et al., 2014). LITERATURE REVIEW As early as 2011, the Library Linked Data Incubator Group (2011) published its final report as a W3C Incubator Group Report. This group consisted of the international experts in the library and information fields who are specialized in Semantic Web and metadata. In this report, it surveyed the current situation of Linked Data, summarized the use cases, and made some important recommendations for implementing the Linked Data in the library community. Another international effort that closely relates to Semantic Web and Linked Data is the annual international conference on Dublin Core and Metadata Applications by the Dublin Core Metadata Initiative (DCMI). This annual conference started in 1995 as workshops only and in 2001 expanded to full conferences with additions of tutorials, presentations, and peer-reviewed papers. From the early on, DCMI tackles the issues related to Semantic Web, especially the ontology and vocabularies. The theme of 2005 conference is “Vocabularies in Practice.” One paper in this year’s proceeding introduced the concept of SKOS (Simple Knowledge Organization System) and recommended a way to use SKOS Core and DCMI Metadata Terms in combination (Miles, 2005). One project report in the 2009 conference proceeding has the title “Research on Linked Data and Co-reference Resolution,” which described the transformation of a dataset of academic authors and their publications into Linked Data (Glaser, 2009). This is one of the earliest publications on Linked Data application in library community. And since 2012, 5Wang & Yang / International Journal of Librarianship, 3(1) there has been increasing focus on the topic of Linked Data in this annual conference series. In more practical area, Karen Coyle (2012) published “Linked Data Tools: Connecting on the Web” in Library Technology Report. In this report, she introduced the basic technologies of Linked Data in a tutorial format. A year later, Erik T. Mitchell (2013) published “Library Linked Data: Research and Adoption” in Library Technology Report, in which he talked about the development and research of Linked Data in library community. In 2016, Mitchell (2016) published another report dealing with the library adoption and practice of Linked Data entitled “Library Linked Data: Early Activity and Development.” The three reports by Colye and Mitchell have played an important role in helping librarians learn about Linked Data. Since 2015, more articles on the case studies and use examples in Linked Data have been published. Karim Tharani’s article (2015) explores the possibility of using BIBFRAME to harvesting and sharing bibliographic data as linked data by ways of a case study. The article of Jin, Hahn, and Croll (2016) also talks about their project of transforming and enriching nearly 300,000 e-books MARC records to BIBRAME records and in the meantime increasing the discoverability of accessibility of those resources. Kimmy Szeto’s article explores how linked open data can transform and enhance the discovery and search of music resources. (Szeto, 2017) Recently, another project by OCLC’s PCC (Program for Cooperative Cataloging) was carried out to transform the legacy library metadata, that is, the MARC records, to Linked Data. The major task of this project is to create a Linked Data authority control database by aggregating the traditional MARC records of people, organization, and places from many sources and converting them into Linked Data. As stated in PCC 2015-2017 strategic directions (Godby & Smith- Yoshimura, 2017): “Existing methods of library authority control are based on constructing unique authorized access points as text strings (literals). This string-based approach works somewhat well in the closed environment of a traditional library catalog, but not in an open environment where data are shared and linked, and so require unique identifiers. The web presents both a challenge and an opportunity for libraries, which are now in a position to take advantage of data created outside of the library world, and also to contribute library authority data for use by other communities” (p.20). Another issue in transforming library legacy metadata into Linked Data is that there are several efforts from different library organizations, resulting in different conceptual models. Zepounidou et al. (2017) compare four conceptual models, namely Functional Requirements for Bibliographic Records (FRBR), FRBR Object-Oriented (FRBRoo), Bibliographic Framework (BibFrame), and Europeana Data Model (EDM), and try to find the common ground and convergences among them. Therefore, the goal of interoperability can be realized. There are many publications in Linked Data. But sometimes the librarians still feel it’s a challenge to understand the concept of Linked Data. According to Banerjee (2017): “Even though librarians have read about and attended sessions discussing topics such as linked data and FRBR (Functional Requirements for Bibliographic Records) for more than a decade, they still find these things confusing.” (p.21) 6Wang & Yang / International Journal of Librarianship, 3(1) LINKED DATA TECHNOLOGY Resource Description Framework (RDF) Data Model Simply put, a data model is an abstract of real data and their relationships. The most familiar data model we encounter is the tabular data model such as csv file, which lists data in table structured format. The data model for Linked Data is Resource Description Framework (RDF). It is the way to represent the data or resources on the Web. RDF is the most important concept to understand in order to understand Linked Data. In order to understand RDF, first we need to know URI (Universal Resource Identifier). In a nutshell, URI defines a unique address for anything on the Internet, much like the post mail address for every home on the earth. That “anything” on the Internet not only includes physical entities such as apple or Da Vinci, but includes abstract concepts like love and peace also. Take for example: the URI, http://example.org/yongming-wang, is unique in the whole Internet and it refers to one of the authors of this article, Yongming Wang (Note that example.org is not a real domain name. It’s a web convention to be used to demonstrate a website example. Anyone can use it for demonstration purpose). The above URI is also a URL. In other words, URL is one type of URI on the Web. Recently, the name of URI has been changed to IRI, short for International Resource Identifier. So, what exactly is RDF? And what role does it play in Linked Data technology? RDF is a data model which is used to identify or describe things (also called entities) and their relationships on the Web. A RDF statement, also called a RDF triple, has three parts: Subject, Predicate, and Object. The subject and object designate two separate things, and the predicate describes the relationship between the subject and object. Its format goes like this: Here is an example: . It can be expressed in a graph as seen in Figure 1. Figure 1: Part of Informal graph of the sample triples. Reprinted from W3C RDF 1.1 Primer1 1Retrieved from https://www.w3.org/TR/rdf11-primer/#fig1. Copyright © [24 June 2014] World Wide Web 7Wang & Yang / International Journal of Librarianship, 3(1) According to the Linked Data principles, a subject must comprise Universal Resource Identifier (URI), and an object can be either a URI or a literal. The predicate should be defined by RDF Vocabulary and has to be a URI. RDF Vocabulary is like a schema in a relational database. The difference is that while a schema is to define the scope and type of the data, RDF Vocabulary is to define the relationship between data elements. Therefore, in the above example, the subject is stated in the format like http://example.org/bob/ which is a unique URI on the Web, or a unique URL (HTTP URI). The object is another unique URL such as http://example.org/alice. Lastly, the predicate is also presented in the form of a URI such as http://example.org/friendOf/. To write the statement “ explicitly in an official RDF statement, it should look like this: Multiple triples will be shown as a graph describing multiple things and their relationships. Furthermore, the beauty of Linked Data is that those multiple things and relationships reside across the Internet at different locations (aka web servers). The application developers can write applications to aggregate those things (data) on the fly. The users can follow those links (the relationships) to find more information relevant to their interest. See Figure 2 for an example. Figure 2: Informal graph of the sample triples. Reprinted from W3C RDF 1.1 Primer2 Consortium, (MIT, ERCIM, Keio, Beihang). http://www.w3.org/Consortium/Legal/2015/doc-license 2 Retrieved from https://www.w3.org/TR/rdf11-primer/#fig1. Copyright © [24 June 2014] World Wide Web Consortium, (MIT, ERCIM, Keio, Beihang). http://www.w3.org/Consortium/Legal/2015/doc-license 8Wang & Yang / International Journal of Librarianship, 3(1) There are six triples in Figure 2, in the form of . They are: . . . . . This kind of graph can be extended unlimitedly. In such a way, almost everything on the Web can be uniquely described and linked together. Eventually, the graph will look like the giant graph in Figure 3. Figure 3: Linking Open Data cloud diagram 20173 RDF Vocabularies It will be a challenge to tackle the topic of RDF vocabulary as compared to other Linked Data concepts such as RDF data model that this paper talked about before and RDF serialization that will be discussed later. Much of the confusion and the slow adaptation of Linked Data is caused by the complexity of RDF vocabulary. Simply put, RDF vocabularies are like the schema in a relational database. According to Wood (2014): 3 Reprinted from The Linking Open Data cloud diagram, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. Retrieved from http://lod-cloud.net/. CC-BY-SA license. 9Wang & Yang / International Journal of Librarianship, 3(1) “[RDF Vocabularies] provide definitions of the terms used to make relationships between data elements. Unlike a relational database’s schema, however, RDF vocabularies are distributed over the Web and are developed by people all over the world, and only come into common use in Linked Data if a lot of people choose to use them.” (p. 38) RDF vocabulary itself is in HTTP URI format and is defined and preserved in the well-known web places. Examples follow. RDF vocabulary is used in the predicate position of the RDF triples to define the relationship between a subject and an object. Let us take the previous example of “Bob is a friend of Alice” to show how the RDF vocabulary is used. As shown before, the RDF statement for “Bob is a friend of Alice” is written like this: To use the RDF vocabulary in the predicate position in real life, we need to replace with: http://xmlns.com/foaf/0.1/knows. Let’s take a look at it closely. There are basically two parts in http://xmlns.com/foaf/0.1/knows: http://xmlns.com/foaf/0.1/ is the namespace for FOAF (Friend of a Friend) Vocabulary, and “knows” part is a property of FOAF Vocabulary. FOAF is an open source project developed in mid-2000 for linking people on the Web. FOAF is widely used in social networking by many Linked Data projects. When we want to describe that someone is a friend of someone else, we can use this FOAF property and any computer program of Linked Data can automatically recognize and understand its meaning. As mentioned above, the FOAF Vocabulary is in HTTP URI format (http://xmlns.com/foaf/0.1/), and it’s kept at the well-known web place (http://xmlns.com). A namespace is generally considered a placeholder to uniquely identify a set of names or properties. In the namespace of “foaf”, short for http://xmlns.com/foaf/0.1/, all its properties, including the one we use here, “knows”, are centrally preserved and uniquely defined. In other words, no ambiguity exists that “foaf:knows” (short for http://xmlns.com/foaf/0.1/knows ) refers to a relationship between two persons. We will never confuse it with other “knows.” In this sense, the namespace can also be called the prefix. In order to fully understand RDF vocabularies, and especially, to be able to create your own RDF vocabulary, it is essential to learn the two key components of RDF vocabulary: Resource Description Framework Schema (RDFS) and Web Ontology Language (OWL). RDFS is the definition language for RDF vocabulary. RDFS defines the classes and types which helps create new RDF vocabularies. OWL is an extension of RDF. Due to their complexities and the length limitation of this paper, the authors will not elaborate them here. Another important RDF vocabulary is Simple Knowledge Organization System (SKOS). The main purpose of SKOS is to turn the traditional controlled vocabularies such as thesauri and all sorts of subject headings (e.g. Library of Congress Subject Heading) into RDF vocabularies. This feature makes SKOS especially important for the library community. For better understanding of RDFS, OWL, and SKOS, the authors recommend a book titled “Semantic Web for the Working Ontologist” by Dean Allenmang and James Hendler (Allenmang & Hendler, 2012). As the book includes many examples and is written in an easy 10Wang & Yang / International Journal of Librarianship, 3(1) and light style, Allenmang and Hendler makes learning the rather difficult topics of semantic modeling and ontology an easy task. RDF Serialization RDF Serialization, is the way the RDF statements are written so that the computer program can read and process them. There are different types of RDF serialization. The common ones are: Turtle (short for Terse RDF Triple Language), RDF/XML (the original RDF format in XML), RDFa (RDF embedded in HTML attributes), and the newer and more popular one called JSON- LD. This paper will focus on JSON-LD in this article. JSON-LD, short for JavaScript Object Notation (JSON) for Linking Data, became popular because it’s a favorite scripting language for many web developers and almost all the programming languages have multiple libraries to parse it. JSON is easy to write and read. Let’s still take the previous example to show how its RDF triples can be written in JSON-LD format. Let’s take the following three RDF statements: . . . Their formal RDF triples are: "1990-07- 04"^^ In the above example, http://www.w3.org/1999/02/22-rdf-syntax-ns#type can be shorten as rdf:type, which belongs to RDFS. http://schema.org/birthDate comes from another popular RDF Vocabulary, schema.org. And 1990-07-04 is a literal object of the type Date as defined in XML Schema. The JSON-LD format is as following: { "@context": { "foaf": "http://xmlns.com/foaf/0.1/", "Person": "foaf:Person", "knows": { "@id": "foaf:knows", "@type": "@id" }, "birthdate": { "@id": "http://schema.org/birthDate", "@type": "http://www.w3.org/2001/XMLSchema#date" } }, 11Wang & Yang / International Journal of Librarianship, 3(1) "@id": "http://example.org/bob#me/", "@type": "Person", "birthdate": "1990-07-04", "knows": "http://example.org/alice#me/" } As illustrated above, JSON-LD format is easy to understand. They are all in key-value pairs. The only tricky part is the context object inside which the prefixes or namespaces are defined. SPARQL – The Query Language SPARQL is not an acronym. Its whole name is SPARQL Protocol and RDF Query Language. SPARQL is the querying language for RDF dataset just as SQL is the query language for relational databases. The syntax of SPARQL and SQL are similar. But their similarity stops there. Actually, in order to learn SPARQL quickly, one should forget what one has learned about SQL. We can use SPARQL to query the local RDF file with RDF data in the form of triples (see examples later). We can also use SPARQL to query remote RDF data store no matter where it is on the Web as long as that RDF data store provides a SPARQL endpoint service. Further, we can combine any number of local and remote queries to get the data we want in our application. That is the real power of SPARQL and Linked Data. First, we will start with a simple SPARQL example. We will demonstrate how to use SPARQL to query a local RDF file. Suppose we have a file named bob.rdf with the following content: (bob.rdf) prefix foaf: http://xmlns.com/foaf/0.1/ . prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns . prefix schema: http://schema.org/ . prefix xsd: http://www.w3.org/2001/XMLSchema# . http://example.org/bob#me rdf:type foaf:person . http://example.org/bob#me schema:birthdate “1990-07-04”^^xds:date . http://example.org/bob#me foaf:knows http://example.org/alice#me . http://example.org/bob#me foaf:knows http://example.org/lisa#me . We want to use SPARQL to find all Bob’s friends in bob.rdf file. Here is the query: (Notes: SPARQL finds the result by pattern matching. Any word with a question mark is a variable.) prefix foaf: http://xmlns.com/foaf/0.1/ . select ?x from bob.rdf 12Wang & Yang / International Journal of Librarianship, 3(1) where { http://example.org/bob#me foaf:knows ?y ; } The result will be like this: x alice lisa Inside the where clause, ?y is the variable. http://example.org/bob#me and foaf:knows are given values. It asks to find the value of the object position in all triples given the subject of http://example.org/bob#me and the predicate of foaf:knows. Hence alice and lisa. If we want to find both subject and object based on the predicate’s value of foaf:knows, we will use the query as following: prefix foaf: http://xmlns.com/foaf/0.1/ . select ?x (as Name) ?y (as Friend) from bob.rdf where { ?a foaf:knows ?b ; } The result: Name Friend bob alice bob lisa Once again, SPARQL finds the result by matching the given values in the where clause, and return all values for the variables in one or more triple positions. This is just a simple introduction to SPARQL. It can get rather complicated in the real application. INVOLVEMENT OF LIBRARY COMMUNITY Libraries became aware of the value of Linked Data and the Semantic Web as a great way to describe library resources as early as 2005 when the US, Canada, and UK formed a joint committee to revise AACR2 cataloging rule. The release of RDA (Resource Description and Access) in 2010 provided guidelines to catalog and describe library resources in such a way that the resulting bibliographic data will be in alignment with Linked Data, a Web standard recognized and shared by other communities on the Internet. The advantages of Linked Data are 13Wang & Yang / International Journal of Librarianship, 3(1) manifold, including release of bibliographic data from the silos to the web, link to resources from other communities, and retrieval of library resources by Internet search engines. According to research, 82% of the information consumers start by searching an Internet search engine and only 1% from a library’s website (OCLC, 2011; Wordstream, 2017). Exposure of rich library data in the Semantic Web and the Internet will lead to more use of library resources and better services to users. Since 2010 libraries are vigorously pursuing the goal of transforming bibliographic data into Linked Data which is the required format for the Semantic Web. The road is rocky and the development is slower than anticipated. One reason is that Linked Data is very new to libraries and it is a drastic departure from the traditional cataloging practice. Many technicalities need to be ironed out before the new practice is put to production. The lack of participation could be another possible setback. So far only big libraries and organizations have the technical expertise and financial resources to devote to the test and development of the Linked Data projects. LC, OCLC, and other national libraries have been the leading forces in Linked Data projects in libraries. Most small libraries are watching and waiting rather than participating. There is a lack of prototypes that will demonstrate the benefits of library data as linked data and many librarians still cannot envision how the future Semantic catalog looks and works. The magnitude of data involved, about 40 years of cataloged data, is not an easy task to be transformed into linked data. Library vocabularies and ontologies are complex and take a long time to complete. In spite of the aforementioned obstacles, Linked Data is the right path that libraries worldwide chose to follow and they have made great progress. The following is a description of the accomplishments by library community towards Linked Data and the Semantic Web. Library of Congress (LC) LC has been a world leader in promoting Linked Data technologies and their potential applications in libraries. The first move made by LC was to convert the LCSH, Name Authority File, and other controlled languages into RDF statements and URIs, and thus made them ready for use by other Semantic Web applications. LC is also instrumental in the development of RDA cataloging rule which is based on FRBR (Functional Requirement Requirements for Bibliographic Records) and supports Linked Data. After the release of RDA in 2011, LC immediately began its work on BIBFRAME (Bibliographic Framework), which is a new display standard intended to replace MARC. In late 2012 and early 2013 BIBFRAME 1.0 was released for testing in a pilot project. It included a series of tools such as BIBFRAME Editor, MARC to BIBFRAME Comparison Viewer, and MARCXML to BIBFRAME Transformation Tool. LC has been diligently testing and modifying BIBFRAME since then. This is a time consuming and complicated process. BIBFRAME ontologies or vocabularies are the core and also the more difficult part of BIBFRAME development. Conversion of existing MARC to BIBFRAME is another challenge. LC has 19 million MARC records (McCallum, 2017). The latest data model and second generation of BIBFRAME is BIBFRAME 2.0 released in 2016. The revised new data model includes three core categories of abstraction: work, instance, and item and further defines three additional concepts related to the core categories: agents, 14Wang & Yang / International Journal of Librarianship, 3(1) subjects, and events (LC, 2016). BIBFRAME 2.0 has released MARC to BIBFRAME Conversation Specifications and Programs. However, BIBFRAME Editor 2.0 is still under construction. The BIBFRAME Editor 2.0 will have more complete ontologies that have classes and properties specially designed to describe library resources. The two major vocabularies or ontologies used in BIBFRAME 2.0 are BIBFRAME and MADSRDF (Metadata Authority Description Schema in RDF). In addition, BIBFRAME 2.0 also draws on a few foundation ontologies developed by World Wide Web Consortium including OWL (Web Ontology Language), RDFS (RDF Schema), and SKOS (Simple Knowledge Organization System). “The BIBFRAME 2 ontology is much better integrated with the RDF environment, yet it is also more in synch with the RDA cataloguing rules even while staying rule agnostic” (McCallum, 2017, p.79). Despite of the complex ontologies and vocabularies, BIBFRAME Editor itself is a simple tool that will turn the bibliographic data via a web-based input screen into RDF statements, one of the building blocks for Linked Data. The BIBFRAME 2.0 Conversion Programs are expected to be able to process a bigger number of MARC records and include fuller data from MARC records. It is unknown as to how and where BIBFRAME data will be searched and displayed. LC has made great progress in BIBFRAME development. It is obvious that BIBFRAME will be an ongoing project with future revisions of vocabularies and version releases long after it is in production. Online Computer Library Center (OCLC) OCLC is another leading force in Linked Data research and projects in libraries. Most of the OCLC Linked Data projects revolves around Worldcat.org, a database of more than 400 million bibliographic records from more than 16,000 libraries (OCLC Linked Data Research, 2017). Collaborating with LC and other national libraries, OCLC has achieved remarkable success in this area. The first publicly visible project OCLC undertook was to add Schema.org mark-up to its Worldcat.org records. Schema.org is created by major Internet search engines such as Google, Bing, and Yandex that provides combined requirements and specifications for any individual or organization to follow if they want to be searched and displayed as linked data. “With the addition of Schema.org mark-up to all book, journal and other bibliographic resources in WorldCat.org, the entire publicly available version of WorldCat is now available for use by intelligent Web crawlers, like Google and Bing, that can make use of this metadata in search indexes and other applications” (Murphy, 2012). As Schema.org vocabulary is more general in nature and not detailed enough to describe library resources, OCLC also led and participated in the effort to reconcile Schema.org vocabulary with BIBFRAME vocabulary and the development of bibliographic extension of Schema.org vocabulary (http://bib.schema.org/). OCLC implemented Worldcat.org Works so all the manifestations of the same work are linked and displayed in a cluster using the OCLC FRBR work set algorithm. “The algorithm collects bibliographic records into groups based on author and title information from bibliographic and authority records” (OCLC Linked Data Research, 2017). The Internet search 15Wang & Yang / International Journal of Librarianship, 3(1) engine standards are followed as “The WorldCat Work entity is based upon properties defined by the schema:CreativeWork type” (OCLC Developer Network, 2017). The advantage of gathering all formats of a work under its title is self-evident. As of July 2017, about 215 million work entities are available in Worlcat.org (OCLC Linked Data Research, 2017). OCLC Persons is a similar project except it is about person entities. “WorldCat person entities connect related information about a person into a brief description that includes various formats of the person’s name, creative works that the person has produced, and biographic sources of information about the person. As of July 2017, WorldCat persons include more than 117 million descriptions of authors, directors, musicians, and others, which have been mined directly from WorldCat. These entities were used in a Linked Data pilot program in which libraries used WorldCat persons in their regular workflows” (OCLC Linked Data Research, 2017). Virtual International Authority File (VIAF) is another successful Linked Data project initiated by OCLC and several national libraries including LC, German National Library, and French National Library. Located at https://viaf.org/, VIAF is an international authority file based on authority data from a list of national libraries and maintained by OCLC. To summarize its function, “VIAF matches and links the authority files of national libraries. It then groups all authority records for a given entity into a merged “super” authority record that brings together the different descriptions for that entity” (OCLC Linked Data Research, 2017). VIAF API allows users to search authority data by keywords, name, title, and more and retrieve authority records and relationships between authority records. VIAF is under Open Data Commons Attributions License and any individual or organization can use it. “VIAF has been available as Linked Data since 2009 and is now one of the most widely used Linked Data resources published by the library community” (OCLC Linked Data Research, 2017). OCLC and LC collaborated in developing FAST (Faceted Application of Subject Terminology), a general-purpose subject heading schema derived from Library of Congress Subject Headings (LCSH). The purpose of FAST is to create a simple to use and easy to understand faceted subject scheme than LCSH. The two subject headings are compatible and LCHS can be converted into FAST. Since 2011 FAST is available as Linked Data. FAST is known to be used by some national libraries and organizations for subject indexing and metadata description. According to OCLC, FAST is “one of the library domain’s most widely used subject vocabularies” (OCLC Linked Data Research, 2017). The successful Linked Data projects of WorldCat Works and WorldCat Persons entities, and Schema.org Markup “helped drive more than 74 million visits to WorldCat.org in 2016 and more than 17 million visits to local library catalogs around the globe” (COCL Linked Data Research, 2017). Other US Library Linked Data Projects The library Linked Data movement also comprises projects undertaken by Zepheira and many other academic libraries. Eric Miller, CEO of Zepheira, a Linked Data consulting company that developed BIBFRAME 1.0, advocated immediate action by libraries to publish their data on the 16Wang & Yang / International Journal of Librarianship, 3(1) web so Internet search engines can search and display them on the top of the result page. Toward this end, Zepheira started the Libhub Initiative in 2014 and Library.Link Network in 2016. Partnering with vendors including EBSCO, SirsiDynix, and Innovative Interfaces, the Library.Link Network project involves a four-step process in which “Zepheira copies a partner library’s catalog, converts records into the structured BIBFRAME format, and then hosts these BIBFRAME records in the Library.Link global, shared content distribution network designed for large-scale web ingest” and “Creative Commons licensing—requiring attribution to the library— is also added to each record, ensuring that service providers such as Google and Microsoft know where the data came from and what companies are allowed to do with it” (Enis, 2016). The final step is to publish library bibliographic data, events, hours, and staff information on the web. The initial participants include public libraries. The work is under progress. Data conversion is a key component in Linked Data development for libraries. Colorado College is leading two projects. “One is to convert not only MARC but other data they hold in formats like MODS, Dublin Core, and other XML file formats to BIBFRAME RDF for access across these files. Another converts MARC records to BIBFRAME and then converts BIBFRAME to schema.org for sending to Google” (McCallum, 2017, p.83). A few large academic libraries received grants from the Andrew W. Mellon Foundation for collaboration on Linked Data projects from 2014 to 2018 (LD4L Project Team, 2016). The partner universities include Cornell, Harvard, Columbia, Stanford, Princeton, and others. The projects supported by the grants include Linked Data for Libraries (LD4L), Linked Data for Libraries Labs (LD4L Labs), and Linked Data for Production (LD4P). The goals of those projects is to create an ontology compatible with BIBFRAME and other existing ontologies for describing local scholarly collections, to develop an open source semantic system to edit, search, and display scholarly resources, to test and pilot workflow in Linked Data technical services, and to create tools and guidelines for future work University of California Davis Library also piloted a BIBFLOW project to study the workflow. Those efforts extend LC’s work on Linked Data and will benefit all libraries. National Library of Medicine also actively participated in BIBFRAME test and ontology development. In 2014 NLM published beta versions of two of its datasets as Linked Data: PubChemRDF, containing information on the biological activities of small molecules and MeSH RDF, NLM’s thesaurus of Medical Subject Headings. Both RDF products are searchable from their own SPARQL query interfaces or querying can be directly integrated into programs and services using their SPARQL endpoints (Davis Library, University of California, 2016). Library System Vendors BIBFRAME Editor 2.0 is still not released. Therefore, it is hard at this stage for library system vendor to invest money and manpower into a data model that is still evolving. However, some vendors expressed their commitment to Linked Data and their intention to incorporate BIBFRAME into their systems. A few have taken actions to prepare for the BIBFRAME Editor 2.0 release. 17Wang & Yang / International Journal of Librarianship, 3(1) 1. Ex Libris is developing what is called BIBFRAME publishing feature which will turn MARC into BIBFRAME data. The company’s roadmap for Alma includes cataloging in BIBFRAME format and discovery of materials cataloged in all formats in Primo including those in Linked Data. Innovative Interfaces, Inc. and SirsiDynix partnered with Zepheira to add additional function into their existing system which will enable libraries to transform MARC into BIBFRAME data. They will also incorporate library location data to make the display location-sensitive for patrons. They will enhance their discovery layers to discover Linked Data and connect to outside resources for enriched content. 2. In September of 2017 librarians from 16 European countries and the US met in Germany and discussed the barriers for implementation of BIBFRAME. They felt that the lack of interest from the vendors of Integrated Library Systems (ILS) is one of the key issues. The discussion let to the publication of “BIBFRAME Expectations for ILS vendors” in February 2018 (Organizer Group 2018 European BIBFRAME Workshop, 2018). Linked Data Projects in Non-US Libraries Libraries in the world are watching closely the development of BIBFAME 2.0 and preparing themselves for the release of the new display standard. Libraries in Europe became interested in Linked Data and Semantic Web technologies long before the US libraries. European libraries are pioneers in Semantic Web technologies. The first known library catalog that embedded Linked Data is LIBRIS, the Swedish union catalog. As early as 2008 the catalog data became available as Linked Data and now it contains links to Wikipedia, DBpedia, LC authority files (names and subjects) and VIAF (Papadakis, Kyprianos & Stefanidakis, 2015). The British Library began to publish its British National Bibliography (Linked Open BNB) as Linked Data as early as 2011. Statistics are not available as to how it is being used. French National Library (BNF) has been engaged in the project called “data.bnf.fr” which aims to make the catalog data of BNF into Linked Data. The goal of the project is to allow users to access library data on the web and link BNF data to DBpedia, VIAF, and other sources (Papadakis, Kyprianos & Stefanidakis, 2015. German National Library (DNB) is developing a Linked Data service for the long term commitment to Semantic Web and has been supplying its data in the RDF standard since 2010. The National Library of Spain (BNE) had a similar project called “datos.bne.es” which aims to release its bibliographic data as Linked Data and eventually to become part of the Semantic Web. Canadian Linked Data Initiative (CLDI) is a collaboration between five Canada’s largest research libraries, including National Library of Canada (Library and Archives Canada) and Bibliothèque et Archives Nationales du Québec. The participating libraries felt they were behind in many areas for the impending shift from MARC to Linked Data and BIBFRAME. The aim of the initiative is to get Canadian libraries up to date in strategic planning to embrace the changes in bibliographic control. The participants are discussing staff training, data preparation, enhanced discovery process and anything that is necessary to get Canadian libraries for a smooth transition into Linked Data world. The Japanese National Library, also called National Diet Library (NDL), provides 18Wang & Yang / International Journal of Librarianship, 3(1) metadata as Linked Open Data (LOD) to facilitate effective use by computer systems or applications. National Library of China (NLC) is vigorously engaged in research and discussions on Linked Data and Semantic Web technologies in Chinese language environment. CONCLUSION It has been almost 20 years since the inception of FRBR, then RDA, and now long waited BIBFRAME. The road to Linked Data has been bumpy, but there is no way to turn back. BIBFRAME will be an on-going development even with the upcoming release of BIBFRAME Editor 2.0. We hope that in the next five to ten years, most library data, including millions of bibliographic records in silos, will appear as Linked Data, freely and openly searchable and accessible on the web as many national libraries have done so. Yet libraries still face the new challenge to get bibliographic data into the search path of Internet search engines. “With an imperative to support novel means of discovery, and a wealth of experience in producing high- quality structured data, libraries are natural complementors to Linked Data” (Heath, 2011, p.36). What libraries are trying to accomplish will benefit the society. With that goal in mind, we will succeed. “The library community is poised to make great strides with semantic web technologies, as evidenced by recent endeavors involving BIBFRAME, a protocol that is largely considered to be the next generation standard for assigning and managing bibliographic metadata” (Johnson, 2015, p.42). References Allemang, D. & Hendler, J. (2012). Semantic Web for the Working Ontologist. 2nd Edition. Waltham, MA: Elsevier. Banerjee, K. (2017). Translating Technobabble: All You Really Need to Know about URIs, Linked Data, and FRBR. Computers in Libraries, 37(10), 21-24 Berners-Lee, T., & Fischetti, M. (2000). Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web. San Francisco: HarperBusiness. Berners-Lee, T. (2006). Linked Data. Retrieved from https://www.w3.org/DesignIssues/LinkedData.html, Coyle, K. (2012). Linked Data Tools: Connecting on the Web. Library Technology Reports, 48(4). http://dx.doi.org/10.5860/ltr.48n4 Davis Library, University of California. (2016). Survey of Current Library Linked Data Implementation. Retrieved from https://bibflow.library.ucdavis.edu/xi-survey-of-current- library-linked-data-implementation/ Enis, M. (2016, June 21). Library.Link builds open web visibility for library catalogs, events. Retrieved March 27, 2018, from Library Journal website: http://lj.libraryjournal.com/2016/06/marketing/ 19Wang & Yang / International Journal of Librarianship, 3(1) library-link-builds-open-web-visibility-for-library-catalogs-event Glaser, H., Millard, I., Sung, W., Lee, S., Kim, P., & You, B. (2009). Research on Linked Data and Co-reference Resolution. International Conference on Dublin Core and Metadata Applications, 0, pp. 113-117. Retrieved from http://dcpapers.dublincore.org/pubs/article/view/958/957 Godly, C. J., & Smith-Yoshimura, K. (2017). From Records to Things: Managing the Transition from Legacy Library Metadata to Linked Data. Bulletin of the Association for Information Science and Technology, 43(2), 18-23 Heath, T. & Bizer, C. (2011). Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool Publishers. Jin, Q., Hahn, J., & Croll, G. (2016). BibFrame Transformation for Enhanced Discovery. LRTS, 60(4), 223-235. Johnson, L., Adams Becker, S., Estrada, V., & Freeman, A. (2015). NMC Horizon Report: 2015 Library Edition. Austin, Texas:The New Media Consortium. Retrieved from https://www.nmc.org/publication/nmc-horizon-report-2015-library-edition/ LD4L Project Team. (2016). LD4L gateway. Retrieved March 27, 2018, from LD4L-Linked Data for Libraries website: https://www.ld4l.org/ Library Linked Data Incubator Group. (2011, October 25). Library Linked Data Incubator Group Final Report. Retrieved from https://www.w3.org/2005/Incubator/lld/XGR-lld- 20111025/ Library of Congress. (2016, April 21). Overview of the BIBFRAME 2.0 Model. Retrieved from https://www.loc.gov/bibframe/docs/bibframe2-model.html McCallum, S. (2017). BIBFRAME Development [PDF]. JLIS.it: Italian Journal of Library, Archives, and Information Science, 8(3). https://doi.org/10.4403/jlis.it-12415 Miles, A., Matthews, B., Wilson, M., & Brickley, D. (2005). SKOS Core: Simple knowledge organisation for the Web. International Conference on Dublin Core and Metadata Applications, 0, pp. 3-10. Retrieved from http://dcpapers.dublincore.org/pubs/article/view/798 Mitchell, E. T. (2013). Library Linked Data: Research and Adoption. Library Technology Reports. 49(5). http://dx.doi.org/10.5860/ltr.49n5 Mitchell, E. T. (2016). Library Linked Data: Early Activity and Development. Library Technology Reports. 52(1). http://dx.doi.org/10.5860/ltr.52n1 Murphy, B. (2012, June 20). OCLC Adds Linked Data to WorldCat.org. Retrieved from https://www.oclc.org/en/news/releases/2012/201238.html OCLC. (2013). Meeting the E-Resources Challenge: An OCLC report on effective management, access and delivery of electronic collections [PDF]. Retrieved from https://www.oclc.org/content/dam/oclc/reports/pdfs/OCLC-E-Resources-Report-US.pdf OCLC Linked Data Research. (2017). OCLC and Linked Data [PDF]. Retrieved from https://www.oclc.org/content/dam/oclc/services/brochures/215912_WWAE-OCLC- Linked-Data-Report.pdf OCLC Developer Network. (2017). WorldCat Work Descriptions. Retrieved from 20Wang & Yang / International Journal of Librarianship, 3(1) https://www.oclc.org/developer/develop/linked-data/worldcat-entities/worldcat-work- entity.en.html Organizer Group 2018 European BIBFRAME Workshop. (2018, February 8). BIBFRAME Expectations for ILS Vendors. Retrieved from https://wiki.dnb.de/display/EBW/Documents+and+Results Papadakis, L., Kyprianos, K., & Stefanidakis, M. (2015). Linked Data URIs and Libraries: The Story So Far. D-Lib Magazine, 21(5/6). https://doi.org/10.1045/may2015-papadakis Szeto, K. (2017). The Mystery of the Schubert Song: The Linked Data Promise. Notes. 74(1). 9- 23. http://dx.doi.org/10.1353/not.2017.0071 Tharani, K. (2015). Linked Data in Libraries: A Case Study of Harvesting and Sharing Bibliographic Metadata with BibFrame. Information and Library Technologies. 34(1). https://doi.org/10.6017/ital.v34i1.5664 Wood, D., Zaidman, M., Ruth, L., & Hausenblas, M. (2014). Linked Data: Structured Data on the Web. Shelter Island, NY: Manning Publications Co. Wordstream. (n.d.). Google Ads: What Are Google Ads & How Do They Work? Retrieved from http://www.wordstream.com/google-ads Zapounidou, S., Sfakakis, M., & Papatheodorou, C. (2017). Representing and Integrating Bibliographic Information into the Semantic Web: A Comparison of Four Conceptual Models. Journal of Information Science, 43(4), 525-553. About the authors Yongming Wang is Systems Librarian / Associate Professor of The College of New Jersey. His research interests include Linked Data, Next-Gen library system, text and data analytics, digital library and institutional repository. Sharon Q. Yang is Systems Librarian / Professor of Rider University. Her research interests include Linked Data and Semantic Web, library system and discovery service, institutional repository, open access, copyright.