Ariadne

Librarians and librarianship in the Age of the Internet

This is a distant reading of an electronic journal called Ariadne. The journal ran from 1996 to 2019. Obviously, it is no longer in publication, but, in its time, it was a pioneer publication on the topic of digital libraries.

This data set was created by first using wget to mirror the whole of the journal. I then created a list of the journal's ris files, which are rich in metadata. I then wrote a script -- ris2csv.py -- which looped through each ris file, and output a metadata file suitable for my Reader to build the data set. In the process, the metadata file gets cached as index.csv.

After a bit of modeling, I determined the data set includes 1,700 files for a total of 4 million words. Forms of these files can be found in the data set's cache directory and plain text directory. Both the human-readable index and JSON index describe each of these files in greater detail. Such is traditional bibliographic reading. Based on your experience, to what degree is this data set larger or smaller than other library collections? What might be the themes of which Ariadne is about?

Word clouds illustrating the frequency of unigrams, bigrams, and keywords somewhat connote the aboutness of the journal. "A picture is worth a thousand words"? Given such images, how would you begin to characterize Ariadne?

unigrams-cloud
unigrams
bigrams-cloud
bigrams
keywords-cloud
keywords

The association of keywords with articles can be modeled as a network graph. Keywords and articles are nodes, and edges are the association of keywords to articles. After modeling the articles and their keywords as a graph and then visualizing the result might look like the following. The emphasized keywords echo the frequencies of the keywords (above), but their relationship with the the articles and othe keywords is brought to light. Given the visualization, we might say Ariadne is about information, web, project, libraries, management, and metadata.

network-clusters
network clusters

If Ariadne is about these things, then what articles manifest those ideas to the greatest degree? To address this question I searched the data set's underlying relational database for articles whose keywords included information, web, and projects. Twenty three articles presented themselves, and they included following eight most relevant articles as measured by a form of TFIDF:

  1. DESIRE: Development of a European Service for Information on Research and Education (1996)l
  2. Delivering the Electronic Library: The ARIADNE Reader (1999)
  3. Understanding the Searching Process for Visually Impaired Users of the Web (NoVA) (2001)
  4. Elvira 4: May 1997, Milton Keynes (1997)
  5. Editorial Introduction to Issue 25: Beyond the Web Site (2000)
  6. ADAM: Information Gateway to Resources on the Internet in Art, Design, Architecture and Media (1996)
  7. News and Events (2006)
  8. Another Piece of Cake? (2002)

Topic modeling is an aditdional technique used to model the aboutness of a corpus. Based on the output of rudimentary principle component analysis, I determined topic modeling for twelve topics might be apropos. (See the dendrogram.) After doing so, I believe Ariadne can be clustered into the following twelve themes, which can be visualized as a pie chart. I removed the less significant topics and plotted how the topics ebbed and flowed over time, resulting in a line chart. From these results we can see the topics echo the frequencies and network graph, above. But upon closer examinatio we can see the "book" topic grew over time, where the "book" topic is more like a hyphenated word, "information-book-web-learning-students-research-knowledge-technology-review-work-author-authors".

labels weights features
information 0.50932 information services libraries project access resources service learning staff development research users
search 0.33581 search web people google time information internet engines just results like good
book 0.29595 information book web learning students research knowledge technology review work author authors
system 0.19782 system software users web content server access data systems database interface figure
conference 0.16675 conference web workshop session information day event research presentation sessions back management
access 0.16012 access copyright journal journals publishers publishing research authors material rights published papers
internet 0.15537 information internet resources web subject service search resource eevl database sites engineering
metadata 0.12738 metadata information resource services data core description dublin resources identifier service records
preservation 0.1116 preservation archives project archive collections national data images material content records digitisation
data 0.08915 data research repositories researchers project institutional management science infrastructure support access information
web 0.07702 web html pages check sites links accessibility browser view college main url
text 0.05187 text web people texts language books users information english blind world internet
topics topics over time

If topic modeling predicts the most significant topic is "information services libraries project access...", then what articles include those themes? Luckily, the Reader's underlying topic modeling tool (MALLET) saves this information albiet deep in its bowel's. Here are the eight most significant articles on the topic of "information services libraries project access...", and alas, no, these eight articles do not overlap with the other eight articles listed above. More on that, later.

  1. ./cache/core-edulib_6-1996.html
  2. ./cache/eve-vital_587-1999.html
  3. ./cache/smith-catriona_503-1998.html
  4. ./cache/pinfield-the_687-2001.html
  5. ./cache/tiley-tltp_95-1996.html
  6. ./cache/macdougall-supporting_469-1998.html
  7. ./cache/williams-jisc_564-1999.html
  8. ./cache/huntingford-the_399-1998.html

Eric Lease Morgan <eric_morgan@infomotions.com>
November 3, 2025