Discernment, or "It's a lot about Mom."

Joey Jegier and I would like to know, "When it comes to a set of first year student writings (reflections), with what other words does the word 'discernment' keep?" The answer to this question helps address other, broader questions about the ways students' hearts and minds mature over the first year of their college experience.

The process

The process to address our question is many-fold, and below is an enumeration of some of the techniques we employed.

Articulate a research question

The first step was to articulate some sort of question. We'll call that step done.

Build the corpus

The next step was acquire the student reflections. This resulted in a set of mostly Microsoft Word documents. We converted them to plain text, and we anonymized them. Our analysis is not possible sans plain text, and anonymizing them (removing the students' identities) was an IRB (Institutional Review Board) requirement. The result can be viewed as a set of individual reflections or as a whole.


From here the student, researcher, or scholar can begin to analyze (read) the materials in the traditional manner.

Count & tabulate features

At the same time, one can use text mining, natural language processing, and data science computing methods to count & tabulate features of the corpus (./adr, ./bib, ./ent, ./pos, ./urls, ./wrd). Once that is done, one can peruse the descriptive statistics generated from the results. This whole process alludes to the breadth and depth of the writings.

In this case, the corpus contains 1,400 files for a total of 2 million words. (The Bible is about .8 million words long.) At first glance, the corpus does not allude to the concept of discernment. On the other hand, notice the degree pronouns like "I", and "we" appear. Students may be writing about themselves.





Perform reality check

At this point it is be a good idea to make sure we are not barking up the wrong tree, meaning, "What is this corpus about, and to what degree might we be able to address the research question?" Topic modeling (an unsupervised machine learning process used to enumerate latent themes) is a useful tecyhnique to use here. If the corups is modeled using eight topics, then we might say the corpus is about the following things and in the illustrated proportions:

    labels   weights                                           features
    people   0.86622  people life think things good know something t...
      life   0.71300  life others people community important however...
    school   0.46745   school college friends new time first high i’ve 
    career   0.18927  life career well-lived always lived time lives...
   believe   0.11733  believe faith vulnerability relationships stor...
   mission   0.10640  mission life statement career others well-live...
       god   0.10000  god leadership leader world true response refl...
 community   0.08545  community expectations hope encountered syndro...


Since we are interested in the use of a specific word -- discernement, we can search the corpus for the word, and read items from the results in the traditional manner. Such a search returns only 345 reflections; the word discernment occurs in less than 25% of the entire corpus:

  1. plain text results - readable by just about any software
  2. HTML results - the most useful
  3. comma-separated values (CSV) file - great for filtering and sorting

Articulate lexicon

After close reading of the materials, we can elaborate on our idea of discernment and thus articulate a lexicon -- a list of desireable words -- used to connote our idea of interest. In our case, the list includes: actions, calling, career, discernment, major, meaning, purpose, and values. Similarly, we want to enumerate words of little importance to our investigation, and those words are called stopwords. All of this is important because words are merely proxies for ideas. The lexicon and stopwords enable us to broaden as well as narrow the scope of discernment.

Read, again

We can refine our close reading to include only sentences of interest, thus, parse the corpus into sentences and filter the results by the stems of lexicon words. Each sentence in the result ought to include at least one of our lexicon words. Unfortunately, the result does not seem as meaningful as the set of sentences that only the stem of discernment. Some examples include:


Since words shall be known by the company they keep, concordancing for our lexicon words will result in a display of the lexicon words as well as the words immediately before and after the lexicon word. The result is not a set of sentences, but it does address the question, "With what other words does the word "discernment" (and its similar words) keep?" Some examples include:

		we discussed key questions that help to discern whether we are headed in the right
		day’s world, it is hard to find time to discern your path and find your joys in li
		ences and, as such, it is my mission to discern between the many forces in life an
		ny area is simply not enough – to truly discern what career path you will find you
		f this in mind come the time she was to discern what it was she wanted to do. she 
		e genuinely of interest to me, and then discern my major and career path based off

Count & tabulate bigrams

Bigrams are two-word phrases, and we might ask ourselves, "What two-word phrases exist where one of the words is a form of discernment (ie. "discern", "discerns", "discernment", etc.)?" The anwer can be garnered from a frequency table and visualized in a couple of ways:

word cloud


Count & tabulate collocations

Collocations are akin to combining concordance output and the concept of bigrams. Given a word, as well as a number of words on either side of the given word, what bigrams are statistically interesting. "Which combinations have high likelihood scores?" The resulting table can be visulized as network diagram:


List semantic distances

Another way to measure the frequency and relationships of words is to employ a technique called "word embedding", and it is akin to plotting words on the surface of a sphere. Then, given a word one can determine what words are nearby as well as the compute the distance between two words. Using our lexicon as input, we can create a table of semantic distances and plot the result. Thus, we can see that the concept of discernment was closely associated with the word "carrer" and "major":

semantic distances

Summary and conclusion

There are many types of reading. One reads the label of a candy bar different than they read the scrolling credits of a movie. One reads a scholarly journal article differently than a novel. One reads a newspaper differently than they read social media posts. The techniques outlined above represent yet a different type of reading. The results identify things similar to the words in a back-of-the-book index. Browse the index, identify a word of interest, read the text where the word is used. Repeat. In our case we identifed many different words associated with discernment, and then when we read something as thorough as the entirety of the reflections or as narrow as the sentences containing discernment stems. We can then identify patterns, anomoloies, or trends to help us articualte conclusions. These processes are not replacements for traditional reading. Instead, they are supplements.

As I apply such a process I see the names of many people surrounding the word discernment, and I believe the students used these people as a conduit to understanding what their major ought to be or what career path they ought to follow. It was not uncommon for the students to go home and ask their mother for advice. But this is just one person's... reading.


The software used to do this reading is divided into two parts: 1) the Distant Reader Toolbox, and 2) sets of scripts using the Toolbox. Both parts are available from a site called GitHub. The data set create from the software is affectionally called a "study carrel", and it can be temporarily downloaded from https://distantreader.org/stacks/carrels/jegier-reflections-2023/jegier-reflections-2023.zip.

Joey Jegier <jjegier2@nd.edu>
Eric Lease Morgan <emorgan@nd.edu>
Navari Family Center for Digital Scholarship
University of Notre Dame

March 9, 2023