Microsoft Word - march_ital_dehmlow.docx


Editorial	
  Board	
  Thoughts	
  	
  
A&I	
  Databases:	
  the	
  Next	
  Frontier	
  	
  
to	
  Discover	
   Mark	
  Dehmlow	
  

	
  
	
  

INFORMATION	
  TECHNOLOGY	
  AND	
  LIBRARIES	
  |	
  MARCH	
  2015	
   	
   1	
  

I	
  think	
  it	
  is	
  fair	
  to	
  say	
  that	
  the	
  discovery	
  technology	
  space	
  is	
  a	
  relatively	
  mature	
  market	
  segment,	
  
not	
  complete,	
  but	
  mature.	
  	
  Much	
  of	
  the	
  easy-­‐to-­‐negotiate	
  content	
  has	
  been	
  negotiated,	
  and	
  many	
  
of	
  the	
  systems	
  on	
  the	
  market	
  are	
  above	
  or	
  approaching	
  a	
  billion	
  records.	
  	
  This	
  would	
  seem	
  a	
  lot,	
  
but	
  there	
  is	
  a	
  whole	
  slice	
  of	
  tremendously	
  valuable	
  content	
  still	
  not	
  fully	
  available	
  across	
  all	
  
platforms,	
  namely	
  the	
  specialized	
  subject	
  abstracting	
  and	
  indexing	
  database	
  content.	
  	
  This	
  content	
  
has	
  a	
  lot	
  of	
  significant	
  value	
  for	
  the	
  discovery	
  community—many	
  of	
  those	
  databases	
  go	
  further	
  
back	
  than	
  content	
  pulled	
  from	
  journal	
  publishers	
  or	
  full-­‐text	
  databases.	
  	
  Equally	
  as	
  important	
  is	
  
that	
  they	
  represent	
  an	
  important	
  portion	
  of	
  humanities	
  and	
  social	
  sciences	
  content	
  that	
  is	
  less	
  
represented	
  in	
  discovery	
  systems	
  as	
  compared	
  to	
  STEM	
  content.	
  	
  For	
  vendors	
  of	
  A&I	
  content,	
  the	
  
concerns	
  are	
  clear	
  and	
  realistic,	
  differently	
  from	
  journal	
  publishers	
  whose	
  metadata	
  is	
  meant	
  to	
  
direct	
  users	
  to	
  their	
  main	
  content	
  (full	
  text),	
  the	
  metadata	
  for	
  A&I	
  publishers	
  is	
  the	
  main	
  content.	
  	
  
According	
  to	
  a	
  recent	
  NFAIS	
  report,	
  a	
  major	
  concern	
  for	
  them	
  is	
  that	
  if	
  they	
  include	
  their	
  content	
  
in	
  discovery	
  systems,	
  they	
  “risk	
  loss	
  of	
  brand	
  awareness”	
  and	
  the	
  implications	
  are	
  that	
  institutions	
  
will	
  be	
  more	
  likely	
  to	
  cancel	
  those	
  subscriptions.1	
  	
  The	
  focus	
  therefore	
  seems	
  to	
  have	
  been	
  how	
  to	
  
optimize	
  the	
  visibility	
  of	
  their	
  content	
  in	
  discovery	
  systems	
  before	
  being	
  willing	
  to	
  share	
  it.	
  	
  	
  

In	
  addition	
  to	
  the	
  NFAIS	
  report,	
  some	
  of	
  the	
  conversations	
  I	
  have	
  seen	
  on	
  the	
  topic	
  seem	
  to	
  focus	
  
on	
  wanting	
  discovery	
  system	
  providers	
  to	
  meet	
  a	
  more	
  complex	
  set	
  of	
  requirements	
  that	
  will	
  
maximize	
  leveraging	
  the	
  rich	
  metadata	
  contained	
  in	
  those	
  resources,	
  the	
  idea	
  being	
  that	
  utilizing	
  
that	
  metadata	
  in	
  specific	
  ways	
  will	
  increase	
  the	
  visibility	
  of	
  the	
  content.	
  	
  In	
  principle	
  I	
  think	
  it	
  is	
  a	
  
commendable	
  goal	
  to	
  maximize	
  the	
  value	
  of	
  the	
  comprehensive	
  metadata	
  A&I	
  records	
  contain,	
  
and	
  the	
  complexities	
  of	
  including	
  A&I	
  data	
  into	
  discovery	
  systems	
  need	
  to	
  be	
  carefully	
  considered	
  
-­‐	
  namely	
  blending	
  multiple	
  subject	
  and	
  authority	
  vocabularies,	
  and	
  ensuring	
  that	
  metadata	
  
records	
  are	
  appropriately	
  balanced	
  with	
  full	
  text	
  in	
  the	
  relevancy	
  algorithm.	
  But	
  I	
  also	
  worry	
  that	
  
setting	
  too	
  many	
  requirements	
  that	
  are	
  too	
  complicated	
  will	
  lead	
  to	
  delayed	
  access	
  and	
  biased	
  
search	
  results.	
  	
  It	
  is	
  important	
  that	
  this	
  content	
  is	
  blended	
  in	
  a	
  meaningful	
  way,	
  but	
  determining	
  
relevancy	
  is	
  a	
  complex	
  endeavor,	
  and	
  it	
  is	
  critically	
  important	
  for	
  relevancy	
  to	
  be	
  unbiased	
  from	
  
the	
  content	
  provider	
  perspective	
  and	
  instead	
  focus	
  on	
  the	
  user,	
  their	
  query,	
  and	
  the	
  context	
  of	
  
their	
  search.	
  	
  	
  

Another	
  concern	
  that	
  I	
  have	
  heard	
  articulated	
  is	
  that	
  results	
  in	
  discovery	
  services	
  are	
  unlikely	
  to	
  	
  
be	
  as	
  good	
  as	
  native	
  A&I	
  systems	
  because	
  of	
  the	
  already	
  mentioned	
  blending	
  issues.	
  	
  This	
  is	
  likely	
  

	
  
Mark	
  Dehmlow	
  (mark.dehmlow@nd.edu),	
  a	
  member	
  of	
  the	
  ITAL	
  Editorial	
  Board,	
  is	
  Program	
  
Director,	
  Library	
  Information	
  Technology,	
  University	
  of	
  Notre	
  Dame,	
  South	
  Bend,	
  IN.	
  



	
  
	
  

EDITORIAL	
  BOARD	
  THOUGHTS:	
  A&I	
  DATABASES	
  |	
  DEHMLOW	
   	
   2	
  

to	
  be	
  true,	
  but	
  I	
  think	
  it	
  is	
  critical	
  to	
  focus	
  on	
  the	
  purpose	
  of	
  discovery	
  systems.	
  	
  As	
  Donald	
  
Hawkins	
  recently	
  wrote	
  in	
  a	
  summary	
  of	
  a	
  workshop	
  called	
  “Information	
  Discovery	
  and	
  the	
  
Future	
  of	
  Abstracting	
  and	
  Indexing	
  Services,”	
  “A&I	
  services	
  provide	
  precision	
  discipline-­‐specific	
  
searching	
  for	
  expert	
  researchers,	
  and	
  discovery	
  services	
  provide	
  quick	
  access	
  to	
  full	
  text.”2	
  	
  
Hawkins	
  indicates	
  that	
  discovery	
  systems	
  are	
  not	
  meant	
  to	
  be	
  sophisticated	
  search	
  tools,	
  but	
  
rather	
  a	
  quick	
  means	
  to	
  search	
  a	
  broad	
  range	
  of	
  scholarly	
  resources	
  and	
  I	
  think	
  sometimes	
  a	
  quick	
  
starting	
  point	
  for	
  researchers.	
  	
  Because	
  of	
  the	
  nature	
  of	
  merging	
  billions	
  of	
  scholarly	
  records	
  into	
  a	
  
single	
  system,	
  discovery	
  systems	
  will	
  never	
  be	
  able	
  to	
  provide	
  the	
  same	
  experience	
  as	
  a	
  native	
  A&I	
  
system,	
  nor	
  should	
  they.	
  	
  Over	
  time,	
  they	
  may	
  become	
  better	
  tuned	
  to	
  provide	
  a	
  better	
  overall	
  
experience	
  for	
  the	
  three	
  different	
  types	
  of	
  searchers	
  we	
  have	
  in	
  higher	
  education:	
  novice	
  users	
  like	
  
undergraduates	
  looking	
  for	
  a	
  quick	
  resource,	
  advanced	
  users	
  like	
  graduate	
  students	
  and	
  faculty	
  
looking	
  for	
  more	
  comprehensive	
  topical	
  coverage,	
  and	
  expert	
  users	
  like	
  librarians	
  who	
  want	
  
sophisticated	
  search	
  features	
  to	
  hone	
  in	
  on	
  the	
  perfect	
  few	
  resources.	
  	
  Many	
  of	
  the	
  discovery	
  
systems	
  are	
  working	
  on	
  building	
  these	
  features,	
  but	
  the	
  industry	
  will	
  take	
  time	
  to	
  solve	
  this	
  
problem,	
  and	
  I	
  tend	
  to	
  look	
  at	
  things	
  from	
  the	
  lense	
  of	
  our	
  end	
  users—non-­‐inclusion	
  of	
  this	
  
content	
  directly	
  impacts	
  their	
  overall	
  discovery	
  experience.	
  

One	
  might	
  ask,	
  if	
  the	
  discovery	
  system	
  experience	
  isn’t	
  as	
  precise	
  and	
  complete	
  as	
  the	
  native	
  A&I	
  
experience,	
  why	
  bother?	
  	
  In	
  addition	
  to	
  broadening	
  the	
  subject	
  scope	
  by	
  including	
  many	
  of	
  the	
  
more	
  narrow	
  and	
  deep	
  subject	
  metadata,	
  there	
  is	
  also	
  the	
  importance	
  of	
  serendipitous	
  finding.	
  	
  
That	
  content,	
  in	
  the	
  context	
  of	
  a	
  quick	
  user	
  search,	
  may	
  drive	
  the	
  user	
  to	
  just	
  the	
  right	
  thing	
  that	
  
they	
  need.	
  	
  In	
  addition,	
  my	
  belief	
  is	
  that	
  with	
  that	
  content,	
  we	
  can	
  build	
  search	
  systems	
  that	
  are	
  
deeper	
  than	
  Google	
  Scholar,	
  and	
  by	
  extension	
  provide	
  our	
  end	
  users	
  with	
  a	
  superior	
  search	
  
experience.	
  	
  And	
  so	
  I	
  advocate	
  for	
  innovating	
  now	
  instead	
  of	
  waiting	
  to	
  work	
  out	
  all	
  of	
  the	
  details.	
  	
  
I	
  am	
  not	
  suggesting	
  moving	
  forward	
  callously,	
  but	
  swiftly.	
  	
  The	
  work	
  that	
  NISO	
  has	
  done	
  on	
  the	
  
Open	
  Data	
  Initiative	
  has	
  resulted	
  in	
  some	
  good	
  recommendations	
  about	
  how	
  to	
  proceed.	
  	
  For	
  
example,	
  they	
  have	
  suggested	
  two	
  usage	
  metrics	
  that	
  could	
  be	
  valuable	
  for	
  measuring	
  A&I	
  content	
  
use	
  in	
  discovery	
  systems:	
  search	
  counts	
  (by	
  collection	
  and	
  customer	
  for	
  A&I	
  databases)	
  and	
  
results	
  clicks	
  (number	
  of	
  times	
  an	
  end	
  user	
  clicks	
  on	
  a	
  content	
  provider’s	
  content	
  in	
  a	
  set	
  of	
  
results).3	
  	
  

While	
  I	
  think	
  these	
  types	
  of	
  metrics	
  are	
  aligned	
  with	
  the	
  types	
  of	
  measures	
  that	
  libraries	
  evaluate	
  
A&I	
  database	
  usage	
  by,	
  I	
  think	
  at	
  the	
  same	
  time	
  they	
  don’t	
  really	
  say	
  much	
  about	
  the	
  overall	
  value	
  
of	
  the	
  resources	
  themselves.	
  	
  Sometimes	
  in	
  the	
  library	
  profession,	
  our	
  obsession	
  for	
  counting	
  stuff	
  
loses	
  connection	
  with	
  collecting	
  metrics	
  that	
  actually	
  say	
  something	
  about	
  impact.	
  	
  Of	
  the	
  two	
  
counts,	
  I	
  could	
  see	
  perhaps	
  counting	
  the	
  result	
  clicks	
  as	
  having	
  more	
  value.	
  	
  In	
  this	
  instance,	
  
knowing	
  that	
  a	
  user	
  found	
  something	
  of	
  interest	
  from	
  a	
  specific	
  resource	
  at	
  the	
  very	
  least	
  indicates	
  
that	
  it	
  led	
  the	
  user	
  some	
  place.	
  	
  I	
  think	
  the	
  measure	
  of	
  search	
  counts	
  by	
  collection	
  is	
  less	
  useful.	
  	
  At	
  
best	
  it	
  indicates	
  that	
  the	
  resource	
  was	
  searched,	
  but	
  it	
  tells	
  us	
  nothing	
  about	
  who	
  was	
  searching	
  
for	
  an	
  item,	
  what	
  they	
  found,	
  or	
  what	
  they	
  subsequently	
  did	
  with	
  the	
  item	
  once	
  they	
  found	
  it.	
  	
  I	
  do	
  
think	
  we	
  in	
  libraries	
  need	
  to	
  consider	
  the	
  bigger	
  picture.	
  	
  Regardless	
  of	
  the	
  number	
  of	
  searches	
  



INFORMATION	
  TECHNOLOGY	
  AND	
  LIBRARIES	
  |	
  MARCH	
  2015	
   	
   3	
  

(which	
  doesn’t	
  really	
  tell	
  us	
  anything	
  anyway),	
  we	
  need	
  to	
  recognize	
  the	
  value	
  alone	
  of	
  including	
  
the	
  A&I	
  content,	
  and	
  instead	
  of	
  trying	
  to	
  determine	
  the	
  value	
  of	
  the	
  resource	
  by	
  the	
  number	
  of	
  
times	
  it	
  was	
  searched,	
  focus	
  more	
  on	
  the	
  breadth	
  of	
  exposure	
  that	
  content	
  is	
  getting	
  by	
  inclusion	
  
in	
  the	
  discovery	
  system.	
  

I	
  think	
  a	
  more	
  useful	
  technical	
  requirement	
  for	
  discovery	
  providers	
  would	
  be	
  to	
  provide	
  pathways	
  
to	
  specific	
  A&I	
  resources	
  within	
  the	
  context	
  of	
  a	
  user’s	
  search—not	
  dissimilar	
  to	
  how	
  Google	
  
places	
  sponsored	
  content	
  at	
  the	
  top	
  of	
  their	
  search	
  results,	
  a	
  kind	
  of	
  promotional	
  widget.	
  	
  In	
  this	
  
case,	
  using	
  metadata	
  returned	
  from	
  the	
  query,	
  the	
  systems	
  could	
  calculate	
  which	
  one	
  or	
  two	
  
specific	
  resources	
  would	
  guide	
  the	
  user	
  to	
  more	
  in	
  depth	
  research.	
  	
  By	
  virtue	
  of	
  inclusion	
  of	
  the	
  
resource	
  in	
  the	
  discovery	
  system,	
  those	
  resources	
  could	
  become	
  part	
  of	
  the	
  promotional	
  widget.	
  	
  
This	
  would	
  guide	
  users	
  back	
  to	
  the	
  native	
  A&I	
  resource	
  which	
  both	
  libraries	
  and	
  A&I	
  providers	
  
want,	
  and	
  it	
  would	
  do	
  that	
  in	
  a	
  more	
  intuitive	
  and	
  meaningful	
  way	
  for	
  the	
  end	
  user.	
  

All	
  of	
  the	
  parties	
  involved	
  in	
  the	
  discovery	
  discussion	
  can	
  bring	
  something	
  to	
  the	
  table	
  if	
  we	
  want	
  
to	
  solve	
  these	
  issues	
  in	
  a	
  timely	
  way.	
  	
  I	
  hope	
  that	
  A&I	
  publishers	
  and	
  discovery	
  system	
  providers	
  
make	
  haste	
  and	
  get	
  agreements	
  underway	
  for	
  content	
  sharing	
  and	
  I	
  would	
  recommend	
  that	
  
instead	
  of	
  focusing	
  on	
  requiring	
  finished	
  implementations	
  based	
  in	
  complex	
  requirement	
  before	
  
loading	
  content,	
  both	
  of	
  them	
  should	
  instead	
  focus	
  on	
  some	
  achievable	
  short	
  and	
  long	
  term	
  goals.	
  	
  
Integrating	
  A&I	
  content	
  perfectly	
  will	
  take	
  some	
  time	
  to	
  complete	
  and	
  the	
  longer	
  we	
  wait,	
  the	
  
longer	
  our	
  users	
  have	
  a	
  sub-­‐optimal	
  discovery	
  experience.	
  	
  Discovery	
  providers	
  need	
  to	
  make	
  long	
  
term	
  commitments	
  to	
  developing	
  mechanisms	
  that	
  satisfy	
  usage	
  metrics	
  for	
  A&I	
  content,	
  although	
  
I	
  would	
  recommend	
  defining	
  measures	
  that	
  have	
  true	
  value.	
  	
  A&I	
  providers	
  should	
  be	
  measured	
  in	
  
their	
  demands:	
  while	
  their	
  stakes	
  in	
  system	
  integration	
  is	
  real,	
  there	
  runs	
  a	
  risk	
  of	
  content	
  
providers	
  vying	
  for	
  their	
  content	
  to	
  be	
  preferred	
  when	
  relevancy	
  neutrality	
  is	
  paramount	
  for	
  a	
  
discovery	
  system	
  to	
  be	
  effective.	
  	
  I	
  think	
  it	
  is	
  worth	
  lauding	
  the	
  efforts	
  of	
  a	
  few	
  trailblazing	
  A&I	
  
publishers	
  such	
  as	
  Thomson	
  Reuters	
  and	
  ProQuest	
  who	
  have	
  made	
  agreements	
  with	
  some	
  of	
  the	
  
discovery	
  providers	
  and	
  are	
  sharing	
  their	
  A&I	
  content	
  already,	
  providing	
  some	
  precedent	
  for	
  
sharing	
  A&I	
  content.	
  	
  Lastly,	
  libraries	
  and	
  knowledge	
  workers	
  need	
  to	
  develop	
  better	
  means	
  for	
  
calculating	
  overall	
  resource	
  value,	
  moving	
  beyond	
  strict	
  counts	
  to	
  thinking	
  of	
  ways	
  to	
  determine	
  
the	
  overall	
  scholarly/pedagogical	
  impact	
  of	
  those	
  resources	
  and	
  they	
  need	
  to	
  make	
  the	
  fact	
  alone	
  
that	
  an	
  A&I	
  publisher	
  shares	
  its	
  data	
  with	
  a	
  discovery	
  provider	
  indicate	
  significant	
  value	
  for	
  the	
  
resource.	
  	
  

	
  
	
  
	
  
	
  
	
  
	
  
	
  



	
  
	
  

EDITORIAL	
  BOARD	
  THOUGHTS:	
  A&I	
  DATABASES	
  |	
  DEHMLOW	
   	
   4	
  

	
  
REFERENCES	
  
	
  
1.	
  	
  NFAIS,	
  Recommended	
  Practices:	
  Discovery	
  Systems.	
  NFAIS,	
  2013.	
  
https://nfais.memberclicks.net/assets/docs/BestPractices/recommended_practices_final_aug_
2013.pdf.	
  	
  

2.	
  	
  Hawkins,	
  Donald	
  T.,	
  	
  “Information	
  Discovery	
  and	
  the	
  Future	
  of	
  Abstracting	
  and	
  Indexing	
  
Services:	
  An	
  NFAIS	
  Workshop.”	
  	
  Against	
  the	
  Grain.	
  	
  ,	
  2013.	
  http://www.against-­‐the-­‐
grain.com/2013/08/information-­‐discovery-­‐and-­‐the-­‐future-­‐of-­‐abstracting-­‐and-­‐indexing-­‐
services-­‐an-­‐nfais-­‐workshop/.	
  

3.	
  	
  Open	
  Discovery	
  Initiative	
  Working	
  Group,	
  Open	
  Discovery	
  Initiative:	
  Promoting	
  Transparency	
  in	
  
Discovery.	
  	
  Baltimore:	
  NISO,	
  2014.	
  
http://www.niso.org/apps/group_public/download.php/13388/rp-­‐19-­‐2014_ODI.pdf.