TL;DNR Distant Reader study carrels are platform- and network-independent data sets, and they lend themselves to addressing all sorts of research questions.
The Distant Reader and the Distant Reader Toolbox take an abritrary amount of text as input, do processing against it, and output data sets called "study carrels". The processing includes many steps: 1) converting the original documents into plain-text, extracting features (email addresses, URL's, bibliographics, parts-of-speech, named-entities, and keywords), and reducing the whole to a relational database. Through the exploitation of these data sets, all sorts of intereting research questions can be addressed.
Take a few minutes to familiarize yourself the content of the study carrel, below, and then continue with the tutorial, even further below.
Name Last modified Size Description
Parent Directory -
adr/ 2022-11-09 12:40 - email addresses
bib/ 2022-11-09 12:44 - bibliographics
cache/ 2022-11-09 13:59 - original items
ent/ 2022-11-09 13:26 - named-entities
etc/ 2023-04-20 14:18 - models
figures/ 2022-11-09 14:01 - visualizations
index.htm 2023-05-01 18:04 6.9K computed home page
metadata.csv 2022-11-06 20:33 5.3K item, author, title, and/or date data
pos/ 2022-11-06 20:37 - parts-of-speech
provenance.tsv 2022-11-06 20:33 71 publishing data
txt/ 2022-11-09 14:23 - plain text versions of original items
urls/ 2022-11-06 20:37 - URL's
wrd/ 2022-11-06 20:38 - statistically significant keywords
Each and every study carrel has a very very similar structure:
cache
) containing the original documentstxt
) containing the original documents converted into plain textadr
, bib
, ent
, pos
, urls
, and wrd
) containing tab-separated files of extracted featuresfeatures
) for visualizationsetc
) containing modelsprovenance.tsv
) containing the most rudimentary of publishing informationStudy carrels may contain addtional files, and some of the more common are:
index.htm
- a computed home page, a sort of summary or overview of a carrel's contentsmetadata.csv
- a mapping of original file names to author, title, and date valuesetc/reader.zip
- the whole of a study carrel compressed into a single file
Eric Lease Morgan <emorgan@nd.edu>
November 9, 2022