It's all about Emma,… and her interactions with the people around her.
Emma -- published in 1815 -- is one of the handful of novels written by the Victorian author Jane Austen. Put very succintly, it is the story of Emma and her interactions with the people around her.
My summary (above) and "distant reading" (below) has been done against a version of Emma found at Project Gutenberg, and a copy has been cached locally, just in case.
As far as novels go, Emma is not very long -- only 160,000 words. By comparison, the Bible is about 800,000 words long, and Melville's Moby Dick is about 218,000 words long. Nor is the book very difficult to read; Emma has a Flesch readability score of 73, where 0 means nobody can read it, and 100 means anybody can read it. Based on my experience, scores in the 70's are typical for popular novels.
I used a large-language model to summarize each of the fifty-five chapters in Emma, and from the results, I see a lot of Emma and the names of people around her. All of the summaries are available here, and a few of the more interesting ones are included below:
Emma Woodhouse has lived for 21 years in a comfortable home and happy disposition. Her mother died long ago and she had a governess, Miss Taylor, who died recently. Emma and Miss Taylor had been friends for 16 years. Miss Taylor married Mr. Weston. Emma mourned the loss of her friend.
Emma has given Harriet's fancy a proper direction and raised the gratitude of her young vanity to a very good purpose. Emma is convinced that Mr. Elton's falling in love with Harriet. He talked of Harriet warmly and praised her. Emma wants Harriet's picture.
Emma is worried about Frank Churchill. He came to Highbury for a couple of hours. They met with the utmost friendliness. He was in high spirits, but he was not without agitation. Emma suspects he is less in love than he had been.
Rudimentary frequencies of word, phrases, and keywords begin to allude to the aboutness of any corpus, and Emma is no different. The following word clouds illustrate the frequency of single words, two-word phrases ("bigrams"), and computed keywords (kinda like "subject headings"). Notice how the names of people dominate:
![]() single words |
![]() bigrams |
![]() keywords |
Topic modeling is a computing process used -- among other things -- to enuermate latent themes in a corpus. It is a form of machine learning called "clustering". Like just about any other modeling process done by computers, there are many ways to topic model; topic modeling is not necessarily deterministic. That said, when I topic model Emma four themes present themselves, below, and when I topic model for a number greater than four, then the overal result does not change very much. The theme of "emma" always predominates. The visualizations echo the predominance of the theme "emma":
labels | weights | features |
---|---|---|
emma | 3.0353 | emma harriet weston knightley elton time great woodhouse quite nothing dear always |
engagement | 0.2443 | engagement affection attachment snow circumstances happiness behaviour letter feeling heart resolution reflection |
charade | 0.18906 | charade likeness sit eye sea lines alone wingfield maid picture smith's south |
jane | 0.16189 | jane fairfax bates campbell dixon colonel cole dancing campbells instrument dance crown |
![]() topics |
![]() topics over time |
Analyzing (reading) Emma through the lense of network graphs echos the same things. In the first graph items and keywords are hilighted. Notice how the keyword "emma" is the largest and most central feature. When a clustering method ("modularity") is applied to the network graph, only a small handful of "neighborhoods" present themselves -- one around "emma", and the other around "jane". (By the way Jane's last name is Fairfax.) Such is illustrated in the second graph. This analysis does not output the types of interactions Emma has with the people around her, but it can allude to the people Emma has interactions with and to what degree. More specifically, if I assume sentences containing pairs of names of people connote some sort of relationship between the people, then I can count such sentences, create an edges table, and graph the result. Such is what I did, and based on this, I assert Emma has the greatest number of interactions with Harriet, Jane, and Elton:
![]() items and keywords |
![]() clusters |
![]() interactions |
Understanding what nouns exist in a corpus help one understand what things exist in a corpus. Undestanding what verbs exist in the same corpus helps one undstand what those things do. In an effort to understand what nouns are in Emma, what they do, and to what, I extracted all the sentence of the form subject-verb-object. I then visualized the result in the followig three graphs. The first illustrates what subjects predominate. Notice how the subjects are people. The second graph illustrates what those subjects do, and words like "took", "brought", and "said" stand out. The final graph illustrates what the actions were done against. While it is as obvious, the items in the third graph leans towards very generic things (like "nothing" or "anything") and people, essenctially male people (like "mr. woodhouse", "mr. knightly", or "a man"):
![]() subjects |
![]() verbs |
![]() objects |
See also the manifest and the computed summary page.
Eric Lease Morgan <eric_morgan@infomotions.com>
August 30, 2025