I set out to address the questions, "What is war, and how can it be justified?" I propose to accomplish my goal by doing some analysis against the suggested readings of the University's 2022 Forum on War and Peace. More specifically, I digitized the suggested readings, amalgamated the readings into a corpus, applied a number of different data science computing technquies against the corpus, and tried to answer my questions. Below is what I learned.
The corpus includes fifteen items for a total of 1.5 million words. (By comparison, the Bible is about .8 million words long.) They date from 1500 (Gascoigne's The Fruites of War) to 2022 (Pope Francis's Against War). See the rudimentary bibliography complete with computed summaries and statistically significant keywords for more detail.
Unigram, bigram, and keyword visualizations begin to tell what the corpus is about:
unigrams |
bigrams |
keywords
Hidden in the fine print, I see the phrases "just war" and "good war", which give me hope I might be able to find some sort of answers to my questions. I also see the names of many people, and I might begin to ask new questions like "Who are all of these people, and what do they do?" See the automatically generated summary page for more descriptive statistic-like details regarding the corpus. From there you might discern the corpus is a set of narratives as opposed to a set of academic writings.
To further understand the "aboutness" of the corpus, I applied topic modeling to the whole. (Topic modeling is an unsupervised machine learning process used to enumerate latent themes in a corpus.). Thus, if I model the text for a single word, then the result is "war". I can assert, "The corpus is about war." Modeling with four words returns few surprises, especially with a knowledge corpus's titles: 1) war, 2) time, 3) mining, and 4) uncle. Modeling with fifteen topics results in the following, more nuanced, themes and proportions:
labels weights features time 1.50303 time just man day another old last still take ... war 0.53878 war men death life soldier love soldiers enemy... peace 0.42298 peace war world god people weapons power human... men 0.36095 men trench company fire enemy front british li... world 0.30624 war world american civil looking film new amer... know 0.25768 know around just black ing really hair inside ... german 0.21518 war german people germans men plane american w... mother 0.19793 mother courage cart kattrin war chaplain cook ... mining 0.08940 mining human catholic development rights socia... qasim 0.05114 war qasim porn roy scranton went man told fuck... may 0.02922 may yet might english spanish gascoigne full s... naples 0.02907 naples giulia captain american italian remembe... uncle 0.02225 uncle toby father trim corporal quoth shall wo... slothrop 0.00822 slothrop among light away white rocket comes r... someplace 0.00805 someplace herero fires interface frame hollers...
These themes can be seen as a whole, and thus we can visualize their proportions:
The underlying model is manifested as matrix of rows and columns. If we add author values to the matrix and pivot the matrix accordingly, we can literally to what degree each author wrote about the predicted themes. Thus, compared to the others, the Pope wrote the most about peace. Hmmm:
More to one of my original questions, "What is war?" One way to address that question is simply to query our corpus for "war is" or "war was", as in the following form. Unfortunately, the results link entire books and not chapters, let alone paragraphs or sentences.
Another solution is to apply a concordancing, and suppose the text following each of the phrases are definitions found through concordancing. Some of the more telling results are below, and a complete listing has been saved locally. War is:
To paraphrase John Firth, "You shall know words by the the company they keep", and consequently counting, tabulating, and visualizing the bigrams containing the word war is a bit illustrative with both a word cloud as well as a network diagram. The result begins to tell me how the word war is used in the corpus:
bigram cloud |
bigram network |
Oftentimes, bigrams as seen in the same window of text, like a concordance, is more informative than straight up bigrams. These are called collocations. Collocating bigrams containing the word "war" results in these visualizations:
collocated bigrams cloud |
collocated bigrams network |
After removing the overwhelming bigrams, the results are more meaningful:
collocated bigrams cloud |
collocated bigrams network |
This data set -- study carrel -- ought to be available for downloading at http://carrels.distantreader.org/curated-how_can_war_be_justified-2023/index.zip.
Eric Lease Morgan <emorgan@nd.edu>
Navari Family Center for Digital Scholarship
University of Notre Dame
Date created: April 5, 2023
Date updated: June 2, 2024