Microsoft Word - march_ital_dehmlow.docx Editorial Board Thoughts A&I Databases: the Next Frontier to Discover Mark Dehmlow INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2015 1 I think it is fair to say that the discovery technology space is a relatively mature market segment, not complete, but mature. Much of the easy-‐to-‐negotiate content has been negotiated, and many of the systems on the market are above or approaching a billion records. This would seem a lot, but there is a whole slice of tremendously valuable content still not fully available across all platforms, namely the specialized subject abstracting and indexing database content. This content has a lot of significant value for the discovery community—many of those databases go further back than content pulled from journal publishers or full-‐text databases. Equally as important is that they represent an important portion of humanities and social sciences content that is less represented in discovery systems as compared to STEM content. For vendors of A&I content, the concerns are clear and realistic, differently from journal publishers whose metadata is meant to direct users to their main content (full text), the metadata for A&I publishers is the main content. According to a recent NFAIS report, a major concern for them is that if they include their content in discovery systems, they “risk loss of brand awareness” and the implications are that institutions will be more likely to cancel those subscriptions.1 The focus therefore seems to have been how to optimize the visibility of their content in discovery systems before being willing to share it. In addition to the NFAIS report, some of the conversations I have seen on the topic seem to focus on wanting discovery system providers to meet a more complex set of requirements that will maximize leveraging the rich metadata contained in those resources, the idea being that utilizing that metadata in specific ways will increase the visibility of the content. In principle I think it is a commendable goal to maximize the value of the comprehensive metadata A&I records contain, and the complexities of including A&I data into discovery systems need to be carefully considered -‐ namely blending multiple subject and authority vocabularies, and ensuring that metadata records are appropriately balanced with full text in the relevancy algorithm. But I also worry that setting too many requirements that are too complicated will lead to delayed access and biased search results. It is important that this content is blended in a meaningful way, but determining relevancy is a complex endeavor, and it is critically important for relevancy to be unbiased from the content provider perspective and instead focus on the user, their query, and the context of their search. Another concern that I have heard articulated is that results in discovery services are unlikely to be as good as native A&I systems because of the already mentioned blending issues. This is likely Mark Dehmlow (mark.dehmlow@nd.edu), a member of the ITAL Editorial Board, is Program Director, Library Information Technology, University of Notre Dame, South Bend, IN. EDITORIAL BOARD THOUGHTS: A&I DATABASES | DEHMLOW 2 to be true, but I think it is critical to focus on the purpose of discovery systems. As Donald Hawkins recently wrote in a summary of a workshop called “Information Discovery and the Future of Abstracting and Indexing Services,” “A&I services provide precision discipline-‐specific searching for expert researchers, and discovery services provide quick access to full text.”2 Hawkins indicates that discovery systems are not meant to be sophisticated search tools, but rather a quick means to search a broad range of scholarly resources and I think sometimes a quick starting point for researchers. Because of the nature of merging billions of scholarly records into a single system, discovery systems will never be able to provide the same experience as a native A&I system, nor should they. Over time, they may become better tuned to provide a better overall experience for the three different types of searchers we have in higher education: novice users like undergraduates looking for a quick resource, advanced users like graduate students and faculty looking for more comprehensive topical coverage, and expert users like librarians who want sophisticated search features to hone in on the perfect few resources. Many of the discovery systems are working on building these features, but the industry will take time to solve this problem, and I tend to look at things from the lense of our end users—non-‐inclusion of this content directly impacts their overall discovery experience. One might ask, if the discovery system experience isn’t as precise and complete as the native A&I experience, why bother? In addition to broadening the subject scope by including many of the more narrow and deep subject metadata, there is also the importance of serendipitous finding. That content, in the context of a quick user search, may drive the user to just the right thing that they need. In addition, my belief is that with that content, we can build search systems that are deeper than Google Scholar, and by extension provide our end users with a superior search experience. And so I advocate for innovating now instead of waiting to work out all of the details. I am not suggesting moving forward callously, but swiftly. The work that NISO has done on the Open Data Initiative has resulted in some good recommendations about how to proceed. For example, they have suggested two usage metrics that could be valuable for measuring A&I content use in discovery systems: search counts (by collection and customer for A&I databases) and results clicks (number of times an end user clicks on a content provider’s content in a set of results).3 While I think these types of metrics are aligned with the types of measures that libraries evaluate A&I database usage by, I think at the same time they don’t really say much about the overall value of the resources themselves. Sometimes in the library profession, our obsession for counting stuff loses connection with collecting metrics that actually say something about impact. Of the two counts, I could see perhaps counting the result clicks as having more value. In this instance, knowing that a user found something of interest from a specific resource at the very least indicates that it led the user some place. I think the measure of search counts by collection is less useful. At best it indicates that the resource was searched, but it tells us nothing about who was searching for an item, what they found, or what they subsequently did with the item once they found it. I do think we in libraries need to consider the bigger picture. Regardless of the number of searches INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2015 3 (which doesn’t really tell us anything anyway), we need to recognize the value alone of including the A&I content, and instead of trying to determine the value of the resource by the number of times it was searched, focus more on the breadth of exposure that content is getting by inclusion in the discovery system. I think a more useful technical requirement for discovery providers would be to provide pathways to specific A&I resources within the context of a user’s search—not dissimilar to how Google places sponsored content at the top of their search results, a kind of promotional widget. In this case, using metadata returned from the query, the systems could calculate which one or two specific resources would guide the user to more in depth research. By virtue of inclusion of the resource in the discovery system, those resources could become part of the promotional widget. This would guide users back to the native A&I resource which both libraries and A&I providers want, and it would do that in a more intuitive and meaningful way for the end user. All of the parties involved in the discovery discussion can bring something to the table if we want to solve these issues in a timely way. I hope that A&I publishers and discovery system providers make haste and get agreements underway for content sharing and I would recommend that instead of focusing on requiring finished implementations based in complex requirement before loading content, both of them should instead focus on some achievable short and long term goals. Integrating A&I content perfectly will take some time to complete and the longer we wait, the longer our users have a sub-‐optimal discovery experience. Discovery providers need to make long term commitments to developing mechanisms that satisfy usage metrics for A&I content, although I would recommend defining measures that have true value. A&I providers should be measured in their demands: while their stakes in system integration is real, there runs a risk of content providers vying for their content to be preferred when relevancy neutrality is paramount for a discovery system to be effective. I think it is worth lauding the efforts of a few trailblazing A&I publishers such as Thomson Reuters and ProQuest who have made agreements with some of the discovery providers and are sharing their A&I content already, providing some precedent for sharing A&I content. Lastly, libraries and knowledge workers need to develop better means for calculating overall resource value, moving beyond strict counts to thinking of ways to determine the overall scholarly/pedagogical impact of those resources and they need to make the fact alone that an A&I publisher shares its data with a discovery provider indicate significant value for the resource. EDITORIAL BOARD THOUGHTS: A&I DATABASES | DEHMLOW 4 REFERENCES 1. NFAIS, Recommended Practices: Discovery Systems. NFAIS, 2013. https://nfais.memberclicks.net/assets/docs/BestPractices/recommended_practices_final_aug_ 2013.pdf. 2. Hawkins, Donald T., “Information Discovery and the Future of Abstracting and Indexing Services: An NFAIS Workshop.” Against the Grain. , 2013. http://www.against-‐the-‐ grain.com/2013/08/information-‐discovery-‐and-‐the-‐future-‐of-‐abstracting-‐and-‐indexing-‐ services-‐an-‐nfais-‐workshop/. 3. Open Discovery Initiative Working Group, Open Discovery Initiative: Promoting Transparency in Discovery. Baltimore: NISO, 2014. http://www.niso.org/apps/group_public/download.php/13388/rp-‐19-‐2014_ODI.pdf.