About Distant Reader Study Carrels ================================== tl;dnr - Distant Reader study carrels are data sets, and they are designed to be read by computers as well as people. Introduction ------------ The Distant Reader takes a collection of texts as input, and it outputs data sets called "study carrels". This readme file elaborates on these ideas and thorugh the process addresses the question, "Why should I care?" Basic Layout ------------ Enhancements To The Basic Layout -------------------------------- Desktop Applications -------------------- There are quite a number of different desktop computer applications that can be used against study carrels, and they fall into a number of categories. Text Editors Text editors are not word processors. While text editors and word processors both work with text, the former are more about the manipulation of the text, and the later are more about graphic design. The overwhelming majority of data found in study carrels is in the form of plain text, and you will find the use of a descent text editor indispensible. Using a text editor, you can open and read just about any file found in a study carrel. A good text editor supports powerful find & replace functionality, supports regular expressions, has the ability to open mulit-megabyte files with ease, can turn on and off line wrapping, and the reads text files created by different computers. The following two text editors are recommended. Don't rely on Microsoft Word nor Google Docs, they are word processors. BBEdit - https://www.barebones.com/products/bbedit/ NotePad++ - https://notepad-plus-plus.org/ Word Cloud Applications The use of words clouds is often viewed as sophmoric. This is true becuase they are to often used to illustrate the frequency of all words in a text. On the other hand, if word clouds illustrate the frequencies of specific things -- keywords, parts-of-speech, or named entities -- then word clouds become much more complelling. After all, "A picture is worth a thousand words." A program called Wordle is an excellent word cloud program. It takes raw text as input. It also accepts delimited data as input. The resulting images are colorful, configurable, and exportable. Unfortunately, it is no longer supported; while it will run on most Macintosh comuters, it will no longer run (easily) on Windows computers. (I would pay a fee to have Wordle come back to life and brought up-to-date.) If Wordle does not work for you, then there are an abundance of Web-based word cloud application for use. Wordle - https://web.archive.org/web/20191115162244/http://www.wordle.net/ Concordances Developed in the 13th Century, concordances are all but the oldest of text analysis techiques. They function like the rudimentary find function you see in many applications. Think control-f on steroids. Concordances locate a given word in a text, display the text surrounding the word, and help you understand what other words are used in the same context. After all, to paraphrase a linguist named John Firth, "One shall know a word by the company it keeps." The following is a link to a concordance application that is worth way more than what you pay for it, which is nothing. AntConc - https://www.laurenceanthony.net/software/antconc/ Spreadsheet-Like Applications The overwhelming majority of the content found in study carrels is in the form of plain text, and most of this plain text is structured in the form of tab-delimited text files -- matrixes or sometimes called "data frames". These files are readable by any spreadsheet or database aplication. Microsoft Excel, Google Sheets, or Macintosh Numbers can import Reader study carrel delimited data, but these programs are more about numerical data and less about textual data. Thus, if you want to do analysis against Reader study carrel data, and if you do not want to write your own software, then the use of an analysis program called OpenRefine is highly recommended. OpenRefine eats delimited data for lunch. Once data is imported, OpenRefine supports powerful find & replace functions, counting & tabulating functions, faceting, sorting, exporting, etc. While text editors and concordances supplement traditional reading functions, OpenRefine supplements the process of understanding study carrels as data. OpenRefine - https://openrefine.org/ Topic Modeling Applications Topic modeling is a type of machine learning process called "clustering". Given an integer (I), a topic modeler will divide a corpus into I clusters, and each cluster is akin to a theme. Thus, after practicing with a topic modeler, you can address questions like: what are the things this corpus is about, to what degee are themes manifested across the corpus, and which documents are best reprsented by themes. After supplementing the corpus with metadata (authors, titles, dates, keywords, geners, etc.) Topic modeling becomes even more useful because you can address addtional questdions, such as: how did these themes ebb & flow over time, who wrote about what, and how is this style of writting different from that style of writing. A venerable MALLET application is the grand-daddy of topic modeling tools, but is a command-line driven things. On the other hand, a program called Topic Modeling Tool, which is rooted in MALLET, brings topic modeling to the desktop. Like all the applications listed here, it requires practice to use well, but it works, it works quickly, and the data it outputs can be used in a myriad of ways. Topic Modeling Tool - https://github.com/senderle/topic-modeling-tool Network Analysis Applications Texts can be modeled in the form of networks -- nodes and edges. For example, there are authors (nodes), there are written works (additional nodes), and specific authors write specific works (edges). Similarly, there are works (nodes), there are keywords (additional nodes), and specific works are described with keywords (edges). Given these sorts of networks you can address -- and visualize -- all sorts of questions: who wrote what, what author wrote the most, what keywords dominate the collection, or what keywords are highly significant (central) to many works and therefore authors? Network analysis is rooted in graph theory, and it is not a trivial process. On the other hand, a program called Gephi makes the process easier. Import one of any number of different graph formats or specifically shaped matrixes, apply any number layout options to visualize the graph, filter the graph, visualize again, apply clustering or calcuate graph characteristics, and visualize a third time. The process requires practice, some knowledge of graph theory, and an aesthetic sensibility. In the end, you will garnder a greater understanding of your carrel. Gephi - https://gephi.org Command-Line (Shell) Interface ------------------------------ Reader Toolbox And Command-Line Interface ------------------------------------------ Reader Toolbox and the Python Application Programmer Interface -------------------------------------------------------------- Write your own software ----------------------- Summary ------- -- Eric Lease Morgan Navari Family Center for Digital Scholarship Hesburgh Libraries University of Notre Dame April 13, 2024