TL;DNR Distant Reader study carrels are platform- and network-independent data sets, and they lend themselves to addressing all sorts of research questions.
The Distant Reader and the Distant Reader Toolbox take an abritrary amount of text as input, do processing against it, and output data sets called "study carrels". The processing includes many steps: 1) converting the original documents into plain-text, extracting features (email addresses, URL's, bibliographics, parts-of-speech, named-entities, and keywords), and reducing the whole to a relational database. Through the exploitation of these data sets, all sorts of intereting research questions can be addressed.
Take a few minutes to familiarize yourself the content of the study carrel, below, and then continue with the tutorial, even further below.
Name Last modified Size Description
Parent Directory - adr/ 2022-11-09 12:40 - email addresses bib/ 2022-11-09 12:44 - bibliographics cache/ 2022-11-09 13:59 - original items ent/ 2022-11-09 13:26 - named-entities etc/ 2023-04-20 14:18 - models figures/ 2022-11-09 14:01 - visualizations index.htm 2023-05-01 18:04 6.9K computed home page metadata.csv 2022-11-06 20:33 5.3K item, author, title, and/or date data pos/ 2022-11-06 20:37 - parts-of-speech provenance.tsv 2022-11-06 20:33 71 publishing data txt/ 2022-11-09 14:23 - plain text versions of original items urls/ 2022-11-06 20:37 - URL's wrd/ 2022-11-06 20:38 - statistically significant keywords
Each and every study carrel has a very very similar structure:
cache) containing the original documents
txt) containing the original documents converted into plain text
wrd) containing tab-separated files of extracted features
features) for visualizations
etc) containing models
provenance.tsv) containing the most rudimentary of publishing information
Study carrels may contain addtional files, and some of the more common are:
index.htm- a computed home page, a sort of summary or overview of a carrel's contents
metadata.csv- a mapping of original file names to author, title, and date values
etc/reader.zip- the whole of a study carrel compressed into a single file
Eric Lease Morgan <email@example.com>
November 9, 2022