Moby Dick by Herman Melville

A man, a whale, and whaling

Moby Dick is a novel written by an American writer, Herman Melville, in 1851. From my perspective, it is the story of a man, a whale, and whaling. I came to this conclusion through both the traditional as well as the distant reading processes. To do the former, I got a copy of the book and... read it. To do the later, I downloaded an electronic version of the text from Project Gutenberg and applied various modeling techniques. (A copy of the electronic version of the text is saved locally, just in case the link to Project Gutenberg breaks.) The balance of this missive elaborates on my distant reading process.

When it comes to extents (a professional term used by librarians to connote sizes), Moby Dick is not short nor long. It is about 218,000 words long. (By comparison, the Bible is about 800,000 words long.) Nor is the book very difficult to read; according to my calculations, Moby Dick has a Flesch Readability Score of 73 where a score of 0 means nobody can read the item and 100 means everybody can read the item. Based on my experience, Shakespeare's Sonnets are very easy to read (with scores in the 90's), and classic novels have scores in the 70's.

Counting, tabulating, and measuring the individual words (unigrams), two-word phrases (bigrams), and keywords can be visualized in the form of word clouds, below. (Stop words like "the", "a", and "an" have been removed from analysis.) While the illustration of frequencies is often considered sophomoric, this technique is quick, easy, and alludes to the aboutness of the novel. For example, notice how the words "ahab", "man", "moby dick", "sperm whales", "ship", and "sea" dominate. (As you may or may not know, Ahab is the captain of the whaling ship, and he is the main character of the story.) Moreover, if I were to ask you, "What kinds of whales are mentioned in the novel?", then you now have a pretty good guess. On the other hand, your curiosity may be piqued, and you might ask yourself, "What are 'stubb', 'starbuck', and 'queequeg'?"

unigrams-cloud
unigrams
bigrams-cloud
bigrams
keywords-cloud
keywords

Topic modeling is a more sophisticated method often used to determine the aboutness of a text. (In a sentence, "Topic modeling is an unsupervised machine learning process used to enumerate the latent themes in a corpus." For more detail, see "Topic model" on Wikipedia.) When it comes to topic modeling there is rarely a correct number of topics to denote, but I topic modeled on ten topics because my principle component analysis divided the text into ten topics after a mere two iterations. (See the dendrogram.)

All that said, I assert that if I topic model Moby Dick with ten topics, then those topics would be:

labels weights features
ahab 0.99807 ahab man ship sea time stubb head men
whales 0.2336 whales sperm leviathan time might fish world many
soul 0.08189 soul whiteness dick moby brow mild times wild
pip 0.08068 pip carpenter coffin sun fire blacksmith doubloon try-works
boats 0.07229 boats line air spout water oars tashtego leeward
queequeg 0.06133 queequeg bed room landlord harpooneer door tomahawk bedford
cook 0.0569 cook sharks dat blubber mass tun bucket bunger
whaling 0.05234 whaling ships gabriel voyage whale-ship whalers fishery english
jonah 0.04218 jonah god loose-fish fast-fish law shipmates guernsey-man woe
bildad 0.02847 bildad peleg steelkilt sailor gentlemen lakeman radney don

The topic "abab" should really be read as the hyphenated phrase "ahab-man-ship-sea-time-stubb-head-men". Similarly, the topic "whales" should be interpreted as "whales-sperm-leviathan-time-might-fish-world-many". And so forth. Now, ask yourself, "In the big scheme of things, what is Moby Dick about?" Is it about music, religion, or cooking? If not, then what?

Topic models can be visualized in at least two ways, below. The first (a pie chart) illustrates how each of the topics are compared to the whole, and we can see how the first topic -- ahab-man-ship-sea-time-stubb-head-men -- dominates. The second (a line chart) is more telling. In this case the weight of each topic is plotted over the course of whole novel, kind of like plotting the topics over time. Thus, we can see how the first topic (abbreviated as "ahab") is pretty consistent, but second topic ("whales") seems to ebb and flow. More specifically, "whales" peaks when "ahab" declines. There seems to be some sort of pattern going on here. No? Hmmm...

topics
topics
topics over titles
topics over titles

Using time-tested algorithm called "term frequency/inverse document frequency" it is possible to identify statistically significant keywords in a text document. Each document may be associated with a number of keywords, and there may be many documents. These things -- the keywords and the documents -- can then be nodes and edges of a network graph. Once a network graph is constructed any number of measurements can be applied to it for the purposes of identifying patterns.

Such is exactly what I did. First, I divided Moby Dick into it's component chapters. (There are 138 of them.) I then identified the keywords in each chapter. I then combined all of the keywords and all of the chapters together to create a network graph. Finally, I visualized the graph in two ways, below. The first illustrates the significance of each keyword and their relationship to the whole. Notice how "whale" is very much central to the story; the keyword "whale" is a keyword in many chapters. The second network graph is the result of a clustering process, which identified the distances between all the keywords and all of the chapters. From the result you can see there are a few clusters of chapters going on, and they are centered around keywords such as: whales, ship, Ahab/Stubb, head, and man. In short, the network analysis is another way of determining what keywords are associated with what chapters.

items and keywords
items and keywords
clusters
clusters

Many people know Moby Dick is about a man (Captain Ahab) and his maniacal pursuit of a white whale (who, not incidentally, bit off one of Ahab's legs sometime in the past), but I assert a great deal of the book is also about the process of whaling.

To back up this point first I noticed how the topic model topic "whales" ebbed and flowed with the first topic, "ahab". (See the line chart, above.) I then extracted the thirty-two most significant sentences in the novel which include combinations and related words from the "whales" topic. Those words were: "whales", "sperm", "leviathan", "time", "might", "fish", "world", and "many". (I used a machine learning technique called "embedding" to do this work.) The thirty-two sentences have been cached locally, and granted, the result is hard to read, but give it a go. Finally, I applied a large-language model to the extracted sentences for the purposes of summarization as if the model were a university professor. The result is below. Notice how the summary nor the extracted sentences mention Ahab, Moby Dick, or any of the other characters in the novel. No, the extracted sentences are about whaling. Moreover, the vast majority of the extracted sentences came from chapters whose topic model topic was "whales". (See the list of citations.)

As a university professor, I must say that the passage you provided is a fascinating account of the Sperm Whale and its habits. The author seems to have a deep appreciation for these magnificent creatures and their place in the ocean ecosystem.

The passage begins by describing the sheer size of the Sperm Whale, noting that it is the largest of all the leviathans hunted by whalers. The author then goes on to describe the various ways in which the Sperm Whale is different from other fish, such as its ability to breathe only every seven days or so, and its unique way of giving birth to a single calf at a time.

One of the most interesting aspects of the passage is the author's observation that the Sperm Whale is not just a solitary creature, but rather one that often travels in large herds. This has important implications for whalers, who must navigate these vast schools of whales in order to catch their prey.

The author also notes that the Sperm Whale's food, squid or cuttlefish, can be found at the bottom of the ocean, and that this is one reason why the whale's movements are so difficult to predict. However, the author suggests that by studying the logs of previous whaling voyages, it may be possible to identify patterns in the Sperm Whale's behavior that could help hunters locate their prey more effectively.

Throughout the passage, the author uses vivid language and imagery to bring the Sperm Whale to life for the reader. For example, they describe the whale's blowhole as "a Dutch barn" and its tail as "a mountain of dazzling foam." These kinds of details help to make the passage feel like a window into the world of the Sperm Whale, and give the reader a sense of just how awe-inspiring these creatures must be in person.

Overall, I would say that this passage is a fascinating glimpse into the world of the Sperm Whale, and a testament to the author's deep appreciation for these magnificent creatures.

Again, most people know Moby Dick is a white whale, and upon reading the book one comes across a couple chapters that seem to be all about the color white, specifically chapter 41 ("Moby Dick") and chapter 42 ("The Whiteness of the Whale"). In fact, the use of the word "white" is almost overwhelming, as the following dispersion plot illustrates; the word "white" occurs many times about 4/10 of the way through the novel:

unigrams-cloud
dispersion plot of the word "white"

Moreover, throughout the book, many things are associated with the word "white" other than the word "whale", and they include:

bears belt body bolts brow bubbles bull cedar chapels charger church coral cross curds depths dog elephants fire flag fog fowl friar ghost god ground hairs hoods hump ivory lead leg liver man mariners mass meat membrane moon mountains nun painting phantom quadruped robes sailor sea seamen shadow shark shroud skin smoke spray squalls steeds stone teeth throne tooth tower turbaned vapors veil waiter water

Interesting?

Summary

From my perspective, Herman Melville's Moby Dick is about a man (Captain Ahab), a white whale (Moby Dick), and the process of whaling. More specifically, Abah is obsessed with the capturing and killing of the white whale because the whale bit off his legs in a previous time. The search for the whale is set against a backdrop of the whaling profession. While not really an instruction manual, the reader learns about whaling through elaborations on the word white as well as descriptions of the subordinate characters. To echo a previous question, what are 'stubb', 'starbuck', and 'queequeg'? They are some of the subordinate characters. Stubb is the ship's second mate. Starbuck is the first mate. Queequeg is a harpooner.

Distant reading is not intended to replacement for traditional reading. Instead, these two types of reading are intended to compliment each other. Each has its own strengths and weaknesses; each has it own advantages and disadvantages. Use this distant reading as a way to garner a broader perspective of the novel so when things are presented to you through the traditional reading process you are more aware of what is going on, and you are able to see how these things fit into the bigger picture.

Colophon

This distant reading was done against a thing called a Distant Reader "study carrel". The Distant Reader is a tool of my own design. It takes narrative texts as input (such as the chapters of a novel), applies feature extraction to the texts, and saves the results as a data set or "study carrel". These study carrels are intended to be read by people as well as computers. Moreover, they are intended to be platform- and network-independent, meaning they should be readable by just about any computer with or without an Internet connection. For more information about study carrels, read the readme file that comes with every carrel.

There are a number of quicky and easy ways to get an overview of this particular study carrel. For example, begin by reading the computer-generated summary. It will bring to light many salient details about Moby Dick. For more detail, peruse the carrel's manifest. Almost all of the files are either image files or tab-delimited files that can be imported into any spreadsheet, database, analysis application (like OpenRefine), or programming language. Heck, you can even download the study carrel and do your own analysis:

http://carrels.distantreader.org/curated-moby_dick_by_melville-gutenberg/index.zip

Happy reading!


Eric Lease Morgan <eric_morgan@infomotions.com>
Lancaster, Pennsylvania (United States)

September 29, 2025