Selim 19.indb Phoebe Boyd, Michael D. C. Drout, Namiko Hitotsubachi, Michael J. Kahn, Mark D. LeBlanc & Leah Smith, SELIM 19 (2012): 7–58 ISSN: 1132–631X LEXOMIC ANALYSIS OF ANGLO-SAXON PROSE: ESTABLISHING CONTROLS WITH THE OLD ENGLISH PENITENTIAL AND THE OLD ENGLISH TRANSLATION OF OROSIUS Abstract: In this paper we demonstrate that “lexomic” techniques of computer-assisted statistical analysis, originally validated for Old English poetry, can be adapted and applied to Anglo-Saxon prose texts. The methods we describe employ hierarchical agglomerative cluster analysis to fi nd patterns of vocabulary distribution. These patterns, represented visually as tree diagrams, or dendrograms, can indicate the source structure or the affi nities of Old English texts. Comparing the dendrogram geometry of multiple editions of the Old English Penitential allows us to determine that the methods can produce consistent results even for critical editions made fr om the collation of multiple manuscripts. Analysis of the Old English translation of Orosius’s Historia demonstrates that the techniques can detect where an author has used for a given section of his text sources diff erent fr om those of the main body of the text. We conclude that lexomic methods are a useful new tool for the analysis of Old English prose. Keywords: Lexomics, computer-assisted analysis, digital humanities, penitentials, Orosius, Historiarum adversus paganos libri septem, Alfr edian translations, sources, editions. Resumen: En este artículo demostramos que las técnicas lexómicas de análisis estadístico asistido por ordenador, válidas originalmente para la poesía en inglés antiguo, pueden adaptarse y aplicarse a textos anglosajones en prosa. Los métodos descritos emplean análisis jerárquicos de clústeres aglomerativos para encontrar patrones en la distribución del vocabulario. Tales patrones, representados visualmente mediante diagramas arbóreos o dendogramas, pueden revelar la estructura de la fuente o las afi nidades de textos en inglés antiguo. Comparar la geometría del dendrograma de ediciones múltiples del Old English Penitential permite determinar que esos métodos pueden producir resultados consistentes incluso con ediciones críticas hechas mediante la colación de múltiples manuscritos. El análisis de la traducción anglosajona de la Historia de Orosio demuestra que las técnicas pueden detectar dónde un autor usó para una sección fuenrtes distintas de las del cuerpo principal de texto. Concluimos que los métodos lexómicos son instrumentos útiles para el análisis de la prosa anglosajona. Palabras clave: lexómica, análisis asistido por ordenador, humanidades digitales, penitenciales, Orosio, Historiarum adversus paganos libri septem, traducciones alfr edianas, fuentes, ediciones. In a recent series of papers our research group has demonstrated the value of combining computer- assisted, statistical analysis with traditional, philological approaches to medieval texts. This lexomic1 approach, which detects 1 Coined by Betsey Dexter Dyer in 2002, the term “lexomics” is derived by analogy fr om “genomics” (Dyer 2002) and fi rst appeared in Genome Technology 1.27 (2002). P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 8SELIM 19 (2012) patterns of vocabulary distribution that are not otherwise visible to the unaided eye,2 has already shed new light on poems in Anglo- Saxon and on medieval Latin prose and poetic texts,3 and methods originally developed for the analysis of Old English poetry can, we believe, be adapted to investigate texts fr om the much larger corpus of Anglo-Saxon prose. In this paper, therefore, we use lexomic methods to analyze the Old English penitentials and the Anglo- Saxon translation of Orosius’s Historiarum adversum paganos libri septem, demonstrating not only the utility of the methods but the specifi c ways they must be modifi ed in order to be applied to prose texts, which present a particular suite of problems. Although the challenges presented by text length, manuscript variation and editorial practice are substantial, lexomic analysis of Anglo-Saxon prose provides a new channel of information that can both support conj ectures made by previous scholars and also open up new lines of inquiry. 1 Lexomic Methods Lexomic methods blend techniques fr om bioinformatics,4 2 The development of some of the lexomic methods discussed in this chapter were supported by the National Endowment for the Humanities, which sponsored the research with two grants, NEH HD-50300-08, Pattern Recognition through Computational Stylistics: Old English and Beyond, and NEH PR-50112011, Lexomic Tools and Methods for Textual Analysis: Providing Deep Access to Digitized Texts. Any views, fi ndings, conclusions, or recommendations expressed in this article do not necessarily refl ect those of the National Endowment for the Humanities. 3 Forthcoming papers demonstrate that lexomic methods can also be used to analyze texts in Old Norse, 20th-century Modern English (both drama and prose) and 17th-century English (drama). 4 Bioinformatics treats nucleobases in DNA as an alphabet, combinations of nucleobases as “words,” and genomes as texts. In their analyses, bioinformaticists have re-invented a number of techniques originally developed by philologists, such as the tracing of descent through shared error. See, for example Dyer et al. 2007. Lexomic analysis of Anglo-Saxon prose 9 SELIM 19 (2012) computational stylometry,5 and traditional textual analysis (including philology, source study, historical contextualization and close reading). Using the high-quality electronic editions of medieval texts now available to researchers, we employ computer- assisted statistical techniques to identify patterns, which we then interpret using traditional literary methods. At the beginning of our research, the computational methods told us where in a text to look, while the traditional methods explained what our fi ndings meant, but as our research has progressed we have found that this expected pattern has at times been reversed, and our methods have evolved into a series of iterate and test processes that integrate all the tools at our disposal. Lexomic methods diff er slightly fr om pioneering stylometric analyses in two major ways. First, although most researchers analyze subsets of words in a text (function words or content words, for example), we include every word in our analyses. Second, while computational stylometry has traditionally focused on whole works, we divide our texts into segments and analyze the relationships of these to each other. Also, although the information we recover with our methods may have some bearing on questions of authorship, our analyses have to this point do not focused primarily on author identifi cation but instead on a text’s sources or affi nities.6 The techniques discussed here all can be performed using our soft ware, which is browser-interfaced and fr eely available in the 5 Pioneers of computational stylometry include John Burrows and David Hoover. Burrows 2003 uses statistical analysis of “function words” (prepositions, conj unctions, pronouns) to create textual “signatures” for various writers, which he then uses to attribute authorship in a set of English Restoration poems. Hoover 2004 further refi ned Burrows’s methods and applied them to prose in third-person American novels. 6 Admittedly, sources and affi nities can have some bearing on authorship, and we have used lexomic methods to support the case for identify ing Guthlac B as being written by Cynewulf (Drout et al. 2011: 323–326). P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 10SELIM 19 (2012) Lexos Integrated Workfl ow at http://lexos.wheatoncollege.edu.7 We begin by scrubbing an electronic edition of a text, removing punctuation, changing capital letters to lower-case, and deleting formatting codes and other tags.8 Scrubbing allows us to compare like to like, making certain that we count king as being the same word as King and (king) and not counting commas or periods as “words.” Aft er the text is scrubbed we divide it into segments and then tabulate the words in both the entire text and in each segment.9 In order to allow us to compare segments of diff erent sizes, we compute relative fr equencies for each word by dividing the number of times the word appears in a segment by the total number of words in that segment.10 From this data we produce an n-dimensional array for each segment, where n represents the number of distinct words used in the entire collection of texts being studied.11 7 Documentation and instructional videos and web pages are available at http:// wheatoncollege.edu/lexomics/introduction-lexomics. The research for this paper was performed using a previous iteration of the tools, which are preserved in the Lexomics Tool Archive: http://wheatoncollege.edu/lexomics/tool-archive. 8 The program Scrubber, written primarily by Richard Neal, was used for these purposes. It can also be used to lemmatize a text or to modify special characters. Scrubber is now a part of the Lexos Integrated Workfl ow. The version of Scrubber used to perform the research in this paper is preserved in the Lexomics Tool Archive. 9 The program DiviText, written primarily by Amos Jones, was used to cut texts into segments and count the words in those segments. DiviText itself is not part of the Lexos integrated workfl ow, although Lexos provides much of DiviText’s functionality. DiviText remains accessible in the Lexomics Tool Archive. 10 If there are 1000 words in a segment and ond appears 50 times, we record 50/1000 = 0.05 as the relative fr equency of ond. If a word appears somewhere in the complete text but not in a particular segment we record 0/1000 = 0 for the word’s relative fr equency in that segment. 11 Technically, the scripts use a hash table of arrays. Interested readers are directed to the documented soft ware for specifi cs. Lexomic analysis of Anglo-Saxon prose 11 SELIM 19 (2012) We then use the fr ee implementation of hierarchical, agglomerative cluster analysis (Mardia et al. 1980) within the statistical soft ware package, R (R Development Core Team 2009), to group the segments.12 This clustering method uses a dissimilarity metric for the grouping of texts without pre-specify ing a number of groups. The dissimilarity (or distance) measure is computed for each pair of segments, and these distances are then used to create groupings, or clades,13 of texts by clustering texts that are most similar (i.e., have the shortest distance between them).14 In the analyses presented in this paper, we employ the most commonly used metric, Euclidean distance,15 to calculate the distance between the multidimensional averages of the two clades. We then use hierarchical agglomerative clustering to order these distances and construct a branching diagram, or dendrogram,16 of their relationships. The dissimilarity between clades is represented by the 12 An explanation of the statistical methods, aimed towards humanistic researchers, can be found in Drout 2013: 51–56. 13 The terminology is borrowed fr om evolutionary biology (Hennig 1966). 14 To compare four segments we list all the words in each segment and calculate the relative fr equency of each word in each segment. We then compute (4×3)/2=6 distances, one for each pair of segments, calculate the diff erence between the proportion of a word’s use in each segment, square the diff erences, and total the squared diff erences fr om each word. The distance, then, is the square-root of the squared distance. 15 This metric makes use of all n words in a collection of texts to measure the dissimilarity between two texts. We also experimented with Manhattan and Canberra metrics but found no signifi cant diff erence in the fi nal clustering results. Our soft ware allows researchers to choose among these metrics and between diff erent linkage methods. 16 The program which creates dendrograms, TreeView, was written primarily by Alicia Herbert. TreeView is now a part of the Lexos Integrated Workfl ow. The version of Tree-View used to perform the research in this paper is preserved in the Lexomics Tool Archive. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 12SELIM 19 (2012) vertical length of the line connecting them.17 Figure 1 illustrates the similarities of four hypothetical segments or texts. Any level of the branching diagram can be identifi ed as a clade, and we label clades fr om left to right using Greek letters, fi rst marking all clades at the same level of the hierarchy and then descending to the next level and again labeling left to right. Thus in Figure 1, clade α contains segment A, clade β contains B, C, and D, and clade γ contains only segments C and D. A clade with no subsidiary branches, like clade α, is said to be simplicifolious. Figure 1. Sample Dendrogram The geometry of the dendrogram indicates that segments C and D in Figure 1 are most similar, segment B is closer to clade γ, which contains both C and D, and segment A is least like the other texts. The vertical distance between segments C and D is smaller than that between the simplicifolious clade α and clade β, indicating that segment A is quite diff erent fr om the other segments. 17 In our lexomic analyses the number of words is quite large, so it is diffi cult for the distributions of any single word to make two segments highly similar or dissimilar. Instead, it takes a great deal of commonality (or diff erence) in the proportionate use of a wide array of words to create large similarity (or distance) between two texts. See the discussion in Drout et al. 2011: 311–315. α β γ A B C D Lexomic analysis of Anglo-Saxon prose 13 SELIM 19 (2012) Our previous work with Latin poetry and prose and Old English poems has shown that the geometry of a dendrogram can be infl uenced by the affi nities or sources of the texts being analyzed: similar segments or texts tend to cluster together. For example, the Old English poem Azarias is paired in a dendrogram with the section of Daniel that is known to be very similar to it (both have a recent common textual ancestor; Drout et al. 2011: 307–311). In addition, texts with multiple sources produce dendrograms in which the segments are grouped by source: a dendrogram of the Old English Genesis places Genesis B in a high-level clade entirely separate fr om Genesis A, and the segments of Daniel that are based on Latin canticles are separated fr om the rest of that poem (which is based on the Bible; Drout et al. 2011: 326–335). Dendrograms of Latin texts likewise refl ect both sources and affi nities. The source structure of Alan of Lille’s De planctu naturae is evident in its dendrogram, as is that of Geoff rey of Monmouth’s Vita Merlini. Every papal letter quoted in Bede’s Ecclesiastical History separates fr om Bede’s main text. A dendrogram of the Gesta Friderici Imperatoris places chapters by its two authors (Otto of Friesing and his secretary, Rahewin) in separate clades (Downey et al. 2012). However, the two types infl uences—of sources and of affi nity— can also confl ict with or complicate each other. For example, the segments of the Old English poem Juliana in which Cynewulf closely follows the Latin Vita that was his source cluster separately fr om the rest of Cynewulf ’s corpus (Drout et al. 2011: 333–335), and the segment of Guthlac B dependent upon the “cup of death” motif (which is not found in Felix’s Vita s. Guthlaci) appears separately fr om the rest of that poem and of the signed poems of Cynewulf (Downey et al. 2012). In these and similar cases it is essential to use non-lexomic knowledge about the text to interpret the lexomic results rather than relying solely upon dendrogram geometry. Although the results of lexomic analysis of texts whose sources, affi nities or structures are well understood are not counter- intuitive or even surprising, they are nevertheless quite valuable. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 14SELIM 19 (2012) Even if a particular lexomic analysis tells us nothing entirely new about a text, the correlation of dendrogram geometry with previously existing knowledge gives us some confi dence in analyses of texts whose authorship, sourcing or structures are unknown or controversial. The dendrograms of the known texts serve as controls for the dendrograms of the unknowns; if the former are consistent, we are not unreasonable in trusting the latter. But in order to establish such controls, it is important to determine which variables of orthography, manuscript variation and editorial practice are signifi cant, so that we can compare like to like. 2 Corpus-Specific Parameters Thanks to initiatives both organizational and individual, a signifi cant number of medieval texts are now available in electronic form. Most important for our purposes is the complete corpus of Anglo-Saxon assembled by the Dictionary of Old English.18 But although the DOE Corpus contains a high-quality edition of every known Anglo-Saxon text in a well-curated archive, we still must address some corpus-specifi c questions before we can perform lexomic analysis. First, there is the problem of orthographic variation. Because our soft ware compares and counts words according to exact identity, orthographic variation has the potential to obscure signifi cant patterns or to create statistical artifacts in our analysis. We must therefore process the texts in such a way to eliminate trivial variation without losing signifi cant data. This processing must be customized to each writing system. For the Old English corpus the most signifi cant orthographic variations are between thorn 〈þ〉 and eth 〈ð〉—both of which are used to represent voiced 18 The Dictionary of Old English can be accessed at http://www.doe.utoronto.ca/ index.html; a subscription is required. The tools on the lexomics.wheatoncollege. edu website produce data about the corpus but do not distribute the corpus as a whole. Lexomic analysis of Anglo-Saxon prose 15 SELIM 19 (2012) and unvoiced interdental fr icatives—and among the Tironian note 〈⁊〉, and and ond. Scholars have long noted that the distribution of thorn and eth in the Old English corpus is not phonetically consistent. Unlike Icelandic orthography, in which 〈ð〉 generally represents the voiced and 〈þ〉 the unvoiced interdental fr icative, in Anglo-Saxon either letter can be used represent either sound. The distribution of forms, however, is not entirely random. Some early manuscripts use only 〈ð〉, while in later manuscripts the forms are more evenly distributed (Roberts 2006: 20–28), and diff erent scribes appear to have diff erent tendencies to use each symbol, some, for example, seeming to avoid the use of medial thorn or initial eth but others not following these practices (Klaeber 2008: xxix–xxx, cliv–clvii). David Megginson has shown that there is signifi cant variation in the ratio of thorn to eth fr om manuscript to manuscript and fr om scribe to scribe. He also notes that certain words are consistently spelled with one letter or the other regardless of the phonetic value in the particular context, suggesting, he argues, that the spellings were memorized rather than phonetic.19 Recent work by our research group shows that substantial variations in the thorn to eth ratio within a given scribe’s performance in a given manuscript may be diagnostic of diff erences in textual source (Chauvet and Drout forthcoming). This variation and its possible signifi cance thus creates two problems. If we treat the variation between thorn and eth as signifi cant and count þis and ðis as two distinct words, we may be unable to compare texts that are found in separate manuscripts, since scribal performance might overshadow other kinds of variation.20 But if we collapse the variation and treat all 19 Megginson 1993: 35–36, 49–51, 60–62, 100–107 and passim in discussions of words that contain 〈þ〉 or 〈ð〉. 20 Variation between 〈i〉 and 〈y〉, which O’Donnell 2005 has shown to be the most common variation in the poetic corpus, has not to this point been a signifi cant problem (with the exception of Beowulf ). P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 16SELIM 19 (2012) interdental fr icatives as the same, counting þis and ðis as the same word, we might lose relevant data. Our solution to these problems is both empirical and mathematical. Our soft ware allows us to consolidate texts, by converting all eths to thorns (or vice versa). We can therefore easily compare dendrograms of the consolidated with those of the unconsolidated texts. To this point, comparing hundreds of dendrograms, we have only found one complete text (Beowulf ) and two segments (both in Christ III ) whose locations in a dendrogram change when the texts are consolidated.21 Research into the characteristics of those two segments is ongoing, but we can conclude that in the vast majority of cases, orthographic variation of thorn and eth does not aff ect dendrogram geometry. Mathematically this lack of eff ect can be explained by the interchangeable nature of the two letters. Even when we count þis and ðis separately, if the variants are equally distributed in the segments, the distances between texts will not be aff ected, since the relative fr equency of either þis or ðis will simply be (þis + ðis)/2 split among the two segments.22 Only signifi cant concentrations of either orthographic form would aff ect the dendrogram geometry, and these concentrations appear to be relatively rare in Old English poetry. Furthermore, since the analysis presented in this paper is lexico-morphologic rather than orthographic, we can be reasonably comfortable in using consolidated forms. Nevertheless, we have performed all the experiments discussed here using both consolidated and unconsolidated forms, and the results have been the same. Anglo-Saxon scribes’ use of Tironian note creates a slightly diff erent problem because the grapheme could in Old English represent either and or ond. Expanding the note to either all and 21 Drout et al. forthcoming and Chauvet & Drout forthcoming. 22 If there were 8 instances of the consolidated word in text A and 6 in text B, the distance between the two texts would be 2. If thorn and eth are equally distributed, there would then be 4 instances of each orthographic form in text A and 3 in text B. The distance would then still be 2: (4-3)+(4-3)=8-6. Lexomic analysis of Anglo-Saxon prose 17 SELIM 19 (2012) or all ond, therefore, has the potential both to obscure existing patterns or to create artifacts, since we cannot know what form the scribe was abbreviating.23 We could choose not to expand the note, but by so doing we would be privileging the manuscript form of a text over its linguistic expression—a procedure which might at times be useful, but which is not necessarily always justifi ed. Furthermore, because and/ond/⁊ is the most common word in the Old English corpus, variations in its form aff ect the geometry of dendrograms in a way that variation between thorn and eth do not.24 We therefore use our soft ware to lemmatize ⁊, and and ond to a single form (arbitrarily, and), which eliminates artifacts created by orthography rather than vocabulary distribution. The problems presented by thorn and eth and by ⁊, and and ond are just a subset of the larger challenge of handling morphological, dialectal and grammatical variants. For example, the program counts separately cyning, cyninge and cyninga. Additionally, variation in the spelling of diphthongs (for example, eo or io) or vowels (i or y) could infl uence word counts. We can use our soft ware to lemmatize every word in a text, but this work is both time-consuming and inevitably subjective— problems we try to avoid by the use of information-processing tools. We could also normalize, retaining grammatical variants but consolidating spellings, but again, the process would be both time-consuming and subjective. Furthermore, we have some evidence that the distribution of various infl ected forms of words can be signifi cant, so lemmatizing them could obscure important patterns. In most of the cases we have studied, analyses performed with un-lemmatized texts yield results that are to controls (which 23 The spelling of and or ond can be an indicator of dialect. See Campbell 1959: 110–112. 24 For example, the scribe of Daniel uses and while the scribe of Azarias uses ond, thus creating a diff erence between the texts that is consistent but of only trivial interest for our purposes. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 18SELIM 19 (2012) are derived fr om traditional philological analyses),25 suggesting that for these particular types of problems, lemmatization is not necessary.26 Scholars can use the soft ware to further test the utility of lemmatization in a variety of circumstances, and it may be that lexomic analysis using lemmatized texts has the capability of providing information that is otherwise unavailable,27 but at the present time we have found no benefi t fr om lemmatizing. 3 Manuscripts, Editions and Editorial Practices We must also address the problems associated with using edited or normalized texts instead of diplomatic editions. Although the Dictionary of Old English uses the most authoritative editions of Anglo-Saxon texts, these are still editions, oft en collated fr om multiple manuscripts according to the judgments of various editors, each of whose editorial practices and judgment might diff er both fr om each other and fr om contemporary (and future) views. As Peter Stokes has noted, large-scale analysis using any electronic corpus can theoretically be shaped—to an unknown degree—by the editorial practices and assumptions embedded in that corpus (Stokes 2009). Furthermore, because lexomic analysis is based on critical editions, it may at times not engage particularly closely with any given manuscript. It may have been forty years since Paul Zumthor asserted the authority of manuscripts over critical editions by calling attention to the mouvance of medieval manuscript 25 The exception is Beowulf, in which the spelling and orthographic variations between the A and B scribes are so consistent that they obscure any other potential patterns. We have dealt with this challenge by using a normalized text that makes spelling consistent without erasing morphological and grammatical variation through lemmatization. 26 Scott Kleinman is currently investigating the eff ects of lemmatization on dendrogram geometry. 27 For example, it may be that full lemmatization of text will allow us to apply lexomic analysis to texts on opposite size of the divide between Old and Middle English. Lexomic analysis of Anglo-Saxon prose 19 SELIM 19 (2012) texts (Zumthor 1987), but even though Anglo-Saxon studies never adopted extreme points of view like Bernard Cerquiglini’s assertion that “l’écriture médiévale ne produit pas de variantes, elle est variance” (“medieval writing does not produce variants, it is variance;” Cerquiglini 1999),28 the potential signifi cance of manuscript variation has become more important in recent years.29 By relying primarily on a DOE Corpus made up of critical editions, lexomic analysis goes somewhat against the grain of manuscript-centric approaches. It is therefore important for us to investigate the infl uence of both manuscript variation and editorial practice. These problems are more diffi cult than those of orthographic variation (which lends itself to substitutions that are easy to perform on electronic texts), but their solutions also have some fundamental similarity: by analyzing texts whose structure is already known and comparing these results with those based on manuscripts, we can see how infl uential both manuscript and post- manuscript variation are on dendrogram geometry. The most signifi cant problem is that of editorial collation. To give just one example, the DOE Corpus version of the Rule of St Benedict is based on Arnold Schröer’s 1885 edition, which he produced by collating fi ve manuscripts dating fr om the end of the tenth to the beginning of the twelft h centuries. Schröer’s text, therefore, may not refl ect any single extant manuscript or even the state of any one copy of the Old English Rule in any given time period (Schröer 1965 [1885]).30 Before we put too much stock in the authority of any manuscript version of the text, however, 28 For a useful analysis of these issues see Millett 2008, and for further discussion, see Drout & Kleinman 2010. 29 Among the most successful applications of a manuscript-focused approach is Katherine O’Brien O’Keeff e’s in Visible Song, in which she uses careful examination of manuscripts to demonstrate that in the Old English tradition “an oral poem did not automatically become a fi xed text upon writing” (1990: 46). 30 See also Cameron & Frank 1973: 121–122. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 20SELIM 19 (2012) we should remember that the aim of Schröer’s collation was to produce the most accurate possible text fr om a variety of witnesses, each of which was imperfect in its own way.31 If, for example, we are interested in studying the sources of the Old English Rule, we would want to work with a text as close as possible to the original translation rather than a later witness in which useful information might be obscured by textual corruption. A diplomatic edition is not a priori more useful than a critical one. For nearly all poetic texts in Old English the problems of editorial collations are insignifi cant because most Anglo-Saxon poems appear in only one copy. Furthermore, the editors of the Anglo-Saxon Poetic Records (ASPR) were extremely careful in their transcription and generally judicious in their emendation. Nevertheless, it may be useful to compare dendrograms produced fr om the DOE-adopted ASPR critical edition with those produced fr om a diplomatic text to attempt to gauge the signifi cance of editorial emendation. To produce electronic diplomatic editions of our control poems, our colleague Scott Kleinman modifi ed the DOE’s electronic fi les to make them match the manuscript forms given in the apparatus criticus of the ASPR editions. We then used these electronic diplomatic texts to repeat the experiments that had distinguished Genesis A fr om Genesis B and matched Azarias with the correct section of Daniel. Figure 2 shows the results of our analysis of Genesis. Both the diplomatic and critical editions have the same high-level clade structure in which the fi rst major division separates Genesis B fr om Genesis A and the second high-level division separates Genesis A into two large clades, one containing segments 1, 5, 6 and 7 and the other containing segments 8 through 11. 31 As Tom Shippey notes, it is easy to celebrate the variant aft er the production of readable editions, but quite another thing to try to puzzle out unedited texts for the fi rst time (Shippey 2007: 151–152). See also Shippey 2008. Lexomic analysis of Anglo-Saxon prose 21 SELIM 19 (2012) 3241 5 678 9 10 11 α β γ δ Figure 2. Dendrogram of the Dictionary of Old English Corpus edition of Genesis cut into 1500-word segments 3247653111098 α β γ δ Figure 3. Dendrogram of a diplomatic edition of Genesis cut into 1500-word segments P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 22SELIM 19 (2012) Dendrograms of the diplomatic and critical editions are identical at the higher levels of the clade structure—those at which we separated Genesis A fr om Genesis B. Only deeper within clade δ do we seem some very minor variation. In both texts segments 6 and 7 are the most similar, but in the critical edition segments 5 and 1 are separately paired, while in the diplomatic edition they join the 6–7 clade in a stepwise fashion. This is actually a very subtle distinction, probably caused by small diff erences in segment 5 between the diplomatic and critical editions. We generally have not relied heavily on the exact geometry of the deeper clade structure, which we believe to be very sensitive to minor variations, and the results of this experiment support that approach. Since there is no diff erence in the high-level clade structures of the two editions, there is no reason to prefer the diplomatic edition over the critical (or vice versa) in cases where this upper-level structure is of interest. Whether we had used a diplomatic or a critical edition, we would still conclude that Genesis A is distinct fr om Genesis B, and indeed, these two sections have diff erent sources. The poems Daniel and Azarias allow us to look at a relationship of affi nity. Azarias is quite similar to the third 900- word section of Daniel because both derive fr om the same recent antecedent Old English source even though the poems are found in two diff erent codices, the Exeter Book and the Junius Manuscript. As we did in the Genesis experiment, we compared the dendrograms created using the electronic Dictionary of Old English critical editions to Scott Kleinman’s reconstructed diplomatic editions. Lexomic analysis of Anglo-Saxon prose 23 SELIM 19 (2012) D aniel 5 D aniel 4 D aniel 2 D aniel 1 D aniel 3 Azarias α β Figure 4. Dendrogram of Daniel cut into 900-word segments and Azarias in one 1064- word segment using the Dictionary of Old English Corpus editions α β D aniel 5 D aniel 4 D aniel 2 D aniel 1 D aniel 3 Azarias Figure 5. Dendrogram of Daniel cut into 900–word segments and Azarias in one 1064-word segment using diplomatic editions In comparing Figures 4 and 5, we see that both dendrograms separate Azarias fr om Daniel and identify the correct 900-word segment, Daniel 3, as being most similar to Azarias. The larger P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 24SELIM 19 (2012) clade structure of the dendrograms is essentially the same: the fi rst, second, fourth and fi ft h segments of Daniel are similar to each other, and Azarias is an outlier along with the third segment of Daniel. Minor diff erences in the two dendrograms are found deeper inside the clades. In the dendrogram created fr om the diplomatic edition, the Azarias and Daniel 3 leaves are separate fr om each other as well as fr om the main text, while in the dendrogram created fr om the critical edition, they stick together. On the other hand, within the main body of the poem segments 1 and 2 are paired in the diplomatic edition but are slightly separated in the critical edition. Based on what we know of Daniel and Azarias fr om the use of traditional methods—including simply comparing the poems line-by-line and word-by-word—we conclude that the dendrogram created fr om the critical edition is more consistent with the actual relationship between the two poems. Azarias is very much like the third segment of Daniel, and both of these are less like the rest of the poem. Additional experiments with other texts whose sources and affi nities are known (Christ III, the signed poems of Cynewulf, and others)32 show that dendrograms produced fr om diplomatic editions of Old English poems are consistently identical at the high levels of the clade structure with those produced fr om critical editions. All variations that do occur are deep in the clade structure and have all been the replacement of pairings in the critical edition with stepwise arrangements in the diplomatic. Because our previous research has shown that accurate lexomic analysis is possible even when we use only those words which appear in every segment of a poem (thus eliminating the infl uence of rare words; Drout et al. 2011: 314–315), we conclude that the diff erences between diplomatic and critical editions are relatively invisible to lexomic methods. Because we are comparing the distribution of between 500 and 1000 words per segment, because the most common words in 32 With the exception, as always, of Beowulf. Lexomic analysis of Anglo-Saxon prose 25 SELIM 19 (2012) Anglo-Saxon are those least likely to be emended, and because the ASPR editors were extremely judicious in their textual changes, we conclude that—with the possible exceptions of Beowulf, Exodus and Christ and Satan, which for various reasons are heavily emended— we would gain little or nothing fr om replacing the electronic critical editions with re-constructed diplomatic ones. In fact, in most of the cases we have studied, the critical editions appear to be somewhat closer to the structure of the poems. We conclude that the construction of electronic diplomatic editions for the purpose of lexomic analysis is not likely to produce benefi ts commensurate with the eff ort required to produce them. However, in cases where diplomatic electronic editions do exist, it may be worth examining them as well. It is more diffi cult to determine if we can have the same confi dence in lexomic analyses of prose texts fr om the DOE Corpus. In contrast to the poems, many of the prose texts are extant in multiple manuscript witnesses. Although researchers could use the apparatus of Schröer’s edition of the Rule of St Benedict to reconstruct all fi ve texts as electronic diplomatic editions, the vast amount of tedious manual labor required for such an experiment is currently beyond the resources of our research group (and probably beyond the interest of all other research groups). Fortunately, Allen J. Frantzen generously provided us with electronic editions of all the manuscript versions of the Anglo-Saxon penitentials, thus enabling us to compare dendrograms derived fr om multiple manuscripts, both to each other and to the DOE edition of the text. This exercise has allowed us to determine the degree to which collation and editorial practice infl uences dendrogram geometry. We chose to focus on the Old English Penitential, a tenth- century Anglo-Saxon text that is primarily a translation of a ninth- century Latin penitential written by Haltigar, bishop of Cambrai (Frantzen 1983: 134–139). Books 1–3 of the four-part Anglo-Saxon text translate Books 3–4 of the six-book Latin penitential (Schmitz P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 26SELIM 19 (2012) 1958 [1898]: 275–300), but the fi nal book of the Old English Penitential stems fr om a source written in Anglo-Saxon, the tenth- century penitential now known as the Scrift boc.33 The Old English Penitential is found in four manuscripts: Cambridge, Corpus Christi College, MS 190;34 Oxford, Bodleian Library, Junius MS 121;35 Oxford, Bodleian Library, Laud Misc. MS 482;36 and Brussels, Bibliothèque royale, MS 8558–8563 (Catalogue number 2498).37 These were used by Josef Raith to produce his 1933 collated critical edition (Raith 1964 [1933]), which is currently the text in the Dictionary of Old English Corpus. Frantzen’s digital edition of the penitentials at http://anglo-saxon. net includes all four manuscripts. Because the amount of material in the Brussels manuscript is very small, we omit this manuscript fr om the following discussion. 33 This text has, since Benj amin Thorpe’s 1840 edition, been incorrectly identifi ed as the Confessionale Pseudo-Egberti (Thorpe 1840). Robert Spindler also used this title for his 1934 edition (Spindler 1934), which is used in the Dictionary of Old English corpus. However, as Frantzen notes, the attribution to Egbert is found only in the incipit of CCCC 190, and the ascription most likely refers only to the “Confessional” that follows the incipit, not the Old English Penitential itself. In order to reduce confusion between Latin and Old English documents, Frantzen re-named the text Scrift boc in The Literature of Penance in Anglo-Saxon England (Frantzen 1983: 133–135). 34 Ker, Catalogue, no. 45B; Gneuss, Handlist, no. 59, an Exeter manuscript fr om the middle of the eleventh century. 35 Ker, Catalogue, no. 338, Gneuss, Handlist, no. 644, a Worcester manuscript fr om the last quarter of the eleventh century. 36 Ker, Catalogue, no. 34, Gneuss, Handlist, no. 656, a Worcester manuscript fr om the middle of the eleventh century. 37 Ker, Catalogue, no. 10: Glosses, penitential collections; Gneuss, Handlist, no. 808, a three-part manuscript containing material fr om the tenth, eleventh and twelft h centuries. Lexomic analysis of Anglo-Saxon prose 27 SELIM 19 (2012) Book 1 S41.01.00-S41.15.00 S41.01.01-S41.15.00 Y41.01.01-Y41.15.02Y41.01.00-Y41.15.00 X 41 .0 1. 00 X 41 .0 1. 01 X 41 .0 2. 01 X 41 .0 3. 00 X 41 .0 1. 01 4. 02 8. 02 X 41 .0 9. 00 X 41 .0 3. 01 X 41 .0 4. 01 4. 01 X 41 .0 5. 00 5. 01 5. 02 X 41 .0 6. 00 X 41 .0 6. 01 X 41 .0 7. 00 X 41 .0 7. 01 X 41 .0 8. 00 8. 01 8. 03 9. 01 9. 02 9. 03 10 .0 11 .0 1 11 .0 2 12 .0 1 12 .0 2 13 .0 1 14 .0 1 15 .0 1 15 .0 2 X 41 .1 3. 00 X 41 .1 2. 00 11X 41 .1 0. 00 X 41 .1 4. 00 X 41 .1 5. 00 S4 1. 01 .0 0 Figure 6. Ribbon diagram of Book 1 of the Old English Penitential in three manuscripts S42.01.00-S42.30.00 Y42.01.00-Y42.30.00 X4 1. 01 .0 0 1. 01 1. 02 1. 03 1. 04 1. 05 X4 2. 02 .0 0 2. 01 X4 2. 03 .0 0 3. 01 X4 2. 04 .0 0 4. 01 X4 2. 05 .0 0 5. 01 5. 02 X4 2. 06 .0 0 6. 01 6. 02 X4 2. 07 .0 0 7. 01 X4 2. 06 .0 0 8. 01 8. 02 X4 2. 09 .0 0 9. 01 X4 2. 10 .0 0 10 .0 1 X4 2. 11 .0 0 11 .0 1 11 .0 2 X4 2. 17 .0 1 X4 2. 17 .0 0 X 42 .1 2. 00 12 .0 1 X4 2. 13 .0 0 13 .0 1 X4 2. 14 .0 0 14 .0 1 X4 2. 15 .0 0 15 .0 1 X4 2. 16 .0 0 16 .0 1 X4 2. 18 .0 0 18 .0 1 CCCC 190 Laud Misc. 482 Junius 121 Book 2 (continued) X4 2. 19 .0 0 X4 2. 20 .0 0 X4 2. 21 .0 0 X4 2. 22 .0 0 X4 2. 23 .0 0 X4 2. 24 .0 0 X4 2. 25 .0 0 X4 2. 26 .0 0 X4 2. 27 .0 0 X4 2. 28 .0 0 X4 2. 29 .0 0 X4 2. 30 .0 0 19 .0 1 19 .0 2 20 .0 1 20 .0 3 20 .0 2 21 .0 1 21 .0 2 22 .0 1 23 .0 1 23 .0 2 24 .0 1 24 .0 2 24 .0 3 25 .0 1 25 .0 2 26 .0 1 27 .0 1 27 .0 2 28 .0 1 29 .0 1 30 .0 1 30 .0 2 S42.01.01-S42.30.02 Y42.01.01-Y42.30.02 Book 2 CCCC 190 Laud Misc. 482 Junius 121 S42.01.01-S42.30.02 Y42.01.01-Y42.30.02 Figure 7. Ribbon diagram of Book 2 of the Old English Penitential in three manuscripts P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 28SELIM 19 (2012) Book 3 S43.01.00-S43.16.00 Junius 121 CCCC 190 Laud Misc. 482 X 43 .0 1. 00 X 43 .0 2. 00 X 43 .0 3. 00 X 43 .0 4. 00 X 43 .0 5. 00 X 43 .0 6. 00 X 43 .0 7. 00 X 43 .0 8. 00 X 43 .0 9. 00 X 43 .1 0. 00 X 43 .1 1. 00 X 43 .1 2. 00 X 43 .1 3. 00 X 43 .1 4. 00 X 43 .1 5. 00 X 43 .0 1. 00 S43.01.01-S43.16.03 Y43.01.01-Y43.16.03 1. 01 2. 02 3. 01 4. 01 5. 01 6. 01 7. 01 8. 01 9. 01 10 .0 1 11 .0 1 12 .0 1 13 .0 1 15 .0 1X43.14.01- X43.14.05 X43.16.01- X43.16.03 Y43.05.00- Y43.09.00 Y43.11.00- Y43.16.00Y4 3. 03 .0 0 Y 43 .0 3. 00 Y 43 .0 1. 00 Y 43 .0 4. 00 Y 43 .0 2. 00 Figure 8. Ribbon diagram of Book 3 of the Old English Penitential in three manuscripts X41.11.01-X44.59.03 S4 4. 01 .0 0 Y4 3. 03 .0 0 Y4 3. 03 .0 0 S4 4. 14 .0 3 S4 4. 31 .0 1 Y4 4. 18 .0 2 Y4 4. 29 .0 3 Y44.11.01-S44.18.01 S44.01.01- S44.01.05 S44.02.01- S44.05.01 S44.06.01- S44.10.01 S44.11.01- S44.13.02 S44.32.01- S44.35.01 S44.39.01-S44.59.01S44.36.01- S44.38.01 Y44.19.01-S44.29.02 Y44.30.01-S44.57.01 Y4 4. 58 .0 1- Y4 4. 59 .0 1 S4 4. 01 .0 6 S44.05.02- S44.05.04 S54.38.01- S54.41.01 S44.14.04-S44.30.01 Y44.01.01- Y44.01.05 Y4 4. 01 .0 6 Y4 4. 04 .0 1 Y44.05.01-Y44.10.01 S54.38.01- S54.41.01 X44.01.01-X44.40.01 Y4 4. 02 .0 1- Y4 4. 03 .0 1 Book 4 Junius 121 CCCC 190 Laud Misc. 482 Book 4 (continued) Junius 121 CCCC 190 Laud Misc. 482 Figure 9. Ribbon diagram of Book 4 of the Old English Penitential in three manuscripts Figure 10. Legend for ribbon diagrams Lexomic analysis of Anglo-Saxon prose 29 SELIM 19 (2012) Figures 6–9 represent the relationship among the manuscripts in what we call a ribbon diagram.38 The top ribbon indicates the books (1–4) of the penitential, while the lower ribbons represent the arrangement and relative size of capitulae and canons in each of the three manuscripts. Missing and disarranged sections are indicated by shading. Notice that for books 1, 2, and 3, the ribbons for CCCC 190 and Laud Misc. 483 match up almost exactly, indicating that books 1, 2 and 3 are arranged the same in these manuscripts, with the capitulae grouped together at the beginning of each book as a table of contents. The version of the Old English Penitential in Junius 121, however, is diff erently organized, with capitulae interspersed throughout the text, as chapter headings for the canons. Having the capitulae spread throughout the Junius text creates some challenges for lexomic analysis. Although Corpus and Laud match up segment by segment regardless of segment size, the same content is distributed somewhat diff erently in the Junius manuscript: the fi rst 1000 words of Laud and Corpus are made up entirely of capitulae, while the fi rst 1000 words of Junius are approximately 65 percent text and 35 percent capitulae. To address this problem we used a process we call blending39 to re-arrange the material in the CCCC 190 and Laud manuscripts in order create segments that would allow one-to-one comparisons. We therefore cut the fi rst three books of Corpus and Laud between the capitulae and the main text and then sub-divided each of these segments in half, producing for each book two shorter segments composed entirely of capitulae and two short segments composed entirely of regular text. We then matched the fi rst segment of capitulae with the fi rst segment of text, the second segment of capitulae with the second segment of text, and so on, and then blended together the capitulae and their now-associated main text into new segments. Figure 11 illustrates the process. 38 Ribbon diagrams were developed by M. D. C. Drout and Courtney LaBrie in 2011. 39 The blending technique was developed by M. D. C. Drout and Leah Smith. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 30SELIM 19 (2012) Figure 11. Process of Blending produces segments with the same contents Because the capitulae run in order, putting the fi rst half of the CCCC 190 and Laud capitula sections with the fi rst half of the corresponding text creates new, hybrid segments that are composed of the same material as the Junius text segments, in which the capitulae are interspersed. We can then use these texts to create dendrograms of the three manuscript witnesses of the Old English Penitential. α β α β Blended CCCC 190 Blended Laud Misc. 482 Blended L aud – 4b Blended L aud – 4a Blended L aud – 2b Blended L aud – 2a Blended L aud – 3 Blended L aud – 1 Blended C C C C – 4b Blended C C C C – 4a Blended C C C C – 3 Blended C C C C – 1 Blended C C C C – 2b Blended C C C C – 2a Figure 12. Comparison of dendrograms of the Old English Penitential in CCCC 190 and Laud Misc. 482, segments blended Lexomic analysis of Anglo-Saxon prose 31 SELIM 19 (2012) Of the three manuscripts, Corpus and Laud have the most similar dendrogram geometries, and in the highest level of the clade structure they are the identical. In Figure 12 the segments are named by their relationship to Books of the Old English Penitential. Books 1 and 2 are complete in individual segments; Books 2 and 4, because they are larger, are each divided into two segments, “a” and “b.” The high-level clade structure of the texts in the two manuscripts is identical: segments 1, 2a, 2b and 3 cluster in one clade and segments 4a and 4b in the other. Furthermore, this high- level clade structure is consistent with what we know of the sources of the Old English Penitential: clade α (segments 1, 2a, 2b, and 3), on the left of the dendrogram, has Haltigar’s Latin penitential as its source; the material represented by clade β (segments 4a and 4b), on the right side of the dendrogram, is taken fr om the Old English Scrift boc. In both manuscripts, segment 1 clusters with segment 3, but in Laud, segments 2a and 2b also cluster together, while in Corpus 190 we see a stepwise geometry with 2a and 2b slightly separate. Because the vertical distances between the branches are so short between the inner clades, the geometry may be perturbed by only very small variations in the underlying text. Junius – 3 Junius – 1 Junius – 2b Junius – 2a Junius – 4a Figure 13. Dendrogram of the Old English Penitential in Junius 121 P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 32SELIM 19 (2012) Although the Junius dendrogram in Figure 7 at fi rst glance appears to have a geometry diff erent fr om that of the Corpus and Laud dendrograms, closer examination shows that the dendrograms are the same as long as we take into account the absence of some material fr om the Junius text. In all three dendrograms, segments 1 and 3 cluster most closely, then segments 2b and 2a join that clade (stepwise in Junius and CCCC, pairwise in Laud). Material fr om the fourth book diff ers most in vocabulary and is thus separate fr om the rest of the dendrogram. This clade is simplicifolious in Junius simply because the text corresponding to segment 4b in Laud and CCCC is missing fr om the manuscript. And, as Figure 8 shows, the Junius dendrogram also has essentially the same geometry as the Dictionary of Old English collated text, with the only diff erence being the absence of segment 4b. This geometry is explained by Raith’s editorial practice of using material fr om Junius to fi ll in gaps in Corpus and Laud. Raith’s combined text therefore makes segment 4a somewhat diff erent fr om what it is in either Laud or Corpus (where 4a is more similar to 4b). Junius 121 DOE Collated Text Junius – 3 Junius – 1 Junius – 2b Junius – 2a Junius – 4a D O E O E P – 4a D O E O E P – 4b [not in Junius] D O E O E P – 2a D O E O E P – 2b D O E O E P – 1 D O E O E P – 3 Figure 14. Comparison of dendrograms of the Old English Penitential in Junius 121 with the Dictionary of Old English collated edition of the same text Lexomic analysis of Anglo-Saxon prose 33 SELIM 19 (2012) D O E — 1 D O E — 3 D O E — 2a D O E — 4a D O E — 2b D O E — 4b Junius — 3 C C C C — 4b L aud — 4b L aud — 4a L aud — 2a L aud — 2b L aud — 1 L aud — 3 C C C C — 4a C C C C — 2a C C C C — 2b C C C C — 1 C C C C — 3 Junius — 1 Junius — 2b Junius — 2a Junius — 4a Figure 15. Comparison of dendrograms of the Old English Penitential in the Laud, CCCC, and Junius 121 manuscripts and the Dictionary of Old English collated edition Figure 15 compares the Dictionary of Old English critical edition with all three diplomatic editions. As we would by now expect, lexomic methods correctly place matching segments together (even though the texts are not entirely identical). We also see that within clades, the segments of the DOE collated critical edition stick most closely to the corresponding segments of the Laud manuscript, showing that the DOE edition follows the vocabulary of the Laud manuscript more closely than it does the other manuscripts. If we simplify the terminal leaves of the dendrogram (Figure 16), we can more easily see how the relationships between the texts and the critical edition are represented in the higher- level clade structure. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 34SELIM 19 (2012) L aud D O E C C C C Junius L aud D O E C C C C Junius L aud D O E C C C C Junius L aud D O E C C C C Junius L aud D O E C C C C Junius L aud D O E C C C C Figure 16. Comparison of dendrograms of the Old English Penitential in the Laud, CCCC, and Junius 121 manuscripts and the Dictionary of Old English collated edition, terminal leaves simplifi ed It is now easy to see that the combined dendrogram has the same high-level clade structure as the Junius dendrogram in Figure 13 (again with the exception that segment 4b is absent fr om the Junius text). From these experiments we can draw several conclusions. First, at the higher levels of the clade structure, there is no signifi cant disagreement between the dendrograms produced fr om diplomatic and critical editions. We can therefore use either and still get results that agree with the controls. Furthermore, we note that the relationships of source structure that are of particular interest to us are represented in the dendrograms of the prose texts regardless of manuscript or edition. In all cases, the material with an Old English source is separated fr om that with a Latin source at the highest level of the clade structure. There are small diff erences in dendrogram geometry between diplomatic and critical editions at lower levels of the clade structure, but these are subtle, in each case being the diff erence between a stepwise and a pairwise arrangement of clades Lexomic analysis of Anglo-Saxon prose 35 SELIM 19 (2012) with very short vertical distances between them, a geometry that indicates only small diff erences in vocabulary that should not be used to draw signifi cant conclusions. If the ultimate exemplar of the Old English Penitential included the fi rst three books translated fr om Haltigar plus a fourth book taken fr om the Old English Scrift boc (the conclusion arrived at using traditional methods), then the critical edition accurately refl ects this archetype. Furthermore, the dendrograms made fr om the critical edition display the same basic clade structure as those fr om the diplomatic editions of the manuscripts. Our reception of texts fr om before the age of mechanical reproduction is strongly infl uenced by editorial practices, many of which are opaque to us if we read a text for content alone. We must therefore pay close attention to editorial practices at every level, fr om orthography to word division, emendation and collation, all of which have the potential to aff ect the data we are using to produce dendrograms (and thus analyze textual structures and relationships). Prose texts, which are longer and oft en exist in more manuscript witnesses than Old English poetic texts, present challenging problems, especially if we want to compare those edited by diff erent editors, whose practices likely vary. However, based on both our previous analysis of Anglo-Saxon poetry and the results of this examination of the editions of the Old English Penitential, we can have reasonable confi dence in lexomic investigations that use the critical editions in the Dictionary of Old English Corpus. 4 Lexomic Detection of Sources: The OLD ENGLISH OROSIUS The Spanish priest Paulus Orosius wrote his Historiarum adversus paganos libri septem in 417 or 418 at the urging of his teacher, St Augustine of Hippo. This universal history, which covers events fr om the Fall of Man to the early fi ft h century, was polemical as well as historical, attempting to demonstrate that the political and social disruptions of the author’s lifetime were not due to the adoption of Christianity and subsequent neglect of the pagan gods. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 36SELIM 19 (2012) In the past there were even more disasters, Orosius asserts, and so Christianity need not shoulder the blame for current (fi ft h-century) conditions. The Historiarum adversus paganos was extremely popular throughout the Middle Ages, with over 250 surviving manuscript witnesses (Bately 1980: lv). Some time between 889 and 899 and probably closer to 890, Orosius’s Latin text (hereaft er abbreviated as OH) was translated into Anglo-Saxon (Bately 1980: lxxxvi– xciii). Based on the testimony of William of Malmesbury, this Old English translation (abbreviated Or) was traditionally attributed to King Alfr ed (Stubbs 1887: I.132). Alfr ed’s authorship was never seriously questioned until 1951 (Raith 1951: 54–61), and it was only in 1970 that Janet Bately demonstrated that the translation was almost certainly not by the king himself, although it is likely to have been produced as part of Alfr ed’s educational and translation programs (Bately 1970: 433–460). Where it follows the source text the Anglo-Saxon translation is a basically accurate rendering of OH, but as Bately notes, the translator does not hesitate to omit or reduce the description of many of Orosius’s interpretations of events, at times replacing them with his own observations or analyses and on the whole converting the text fr om a polemical document addressing a fi ft h- century audience to a more general “survey of world history fr om a Christian standpoint” (Bately 1980: xciii). The translator also augmented his text with incidental material fr om various classical and patristic authors40—perhaps drawn fr om annotations in the Latin manuscript that was his exemplar or fr om commentaries— and with geographic information not present in OH. The most famous of the additions are the reports of ninth-century voyagers Ohthere and Wulfstan (hereaft er be referred to as the Voyages), which describe the lands and cultures of the north, but there is also 40 The most complete and up-to-date list of identifi ed or suspected sources can be found in the Fontes Anglo-Saxonici: World Wide Web Register, http://fontes. english.ox.ac.uk (accessed 25 February 2013). See also Bately 1971 (but note that this important article is keyed to Sweet’s edition, not Bately’s later text). Lexomic analysis of Anglo-Saxon prose 37 SELIM 19 (2012) a great deal of geographic material in Or which either replaces or augments the contents of OH.41 The DOE Corpus electronic text is Bately’s defi nitive 1980 E.E.T.S. edition, which is based upon London, British Library, Additional MS 47967 (manuscript L), except for section 15/1–28/11, which are missing in L but found in London, British Library, Cotton Tiberius MS B.i. (manuscript C). Although Bately adopts a few additions and corrections fr om other manuscripts (indicated by square brackets in her text), her edition is not an artifi cial conglomeration of multiple sources but a judicious reconstruction of the single manuscript that seems closest to the original Old English archetype (Bately 1980: xxxviii–xxxix concludes that MSS L and C are at least two removes fr om that text). Our pre-processing for lexomic analysis, then, only requires that we consolidate thorn and eth and lemmatize Tironian, and and ond as well as performing our standard “scrubbing” to remove formatting and punctuation and force all letters to lower-case. We chose to divide the text into 900-word segments, a size which requires some explanation. Previous research has shown that dendrograms of Anglo-Saxon poems are broadly accurate down to a segment size of 500 words, but that dendrograms based on segments closer to 1000 words somewhat more consistent in detail with the known structures of texts (Drout et al. 2011: 311–315). The trick is to avoid creating segments that split apart signifi cant features of a text (for example, spreading the Azarias section of Daniel across two segments) and therefore producing artifacts in the dendrogram. Unfortunately, when we are dealing with a text whose sources are unknown or only suspected we do not have a source-structure to guide the arrangement of our dividing lines. In these cases we have found it useful to create multiple dendrograms of varying sizes to see 41 The possible sources of the geographic material has been the subject of an enormous amount of scholarship. See Bately 1980: lxiii–lxx, lxxxix–xc. The most recent discussion, which is extremely thorough, is Valtonen 2008. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 38SELIM 19 (2012) what patterns are robust across diff erent segment sizes (for example, we might start with 800-word segments and then increase the segment size by 100 words until we reach 1500-word segments). We are then able to isolate distinctive sections of a text by making small adjustments in segment boundaries in subsequent experiments.42 Because the Voyages of Ohthere and Wulfstan are a known feature of Or, we sought to avoid combining these with other material in a single segment in order to avoid creating a hybrid whose vocabulary distribution was representative of neither the Voyages nor the non- Voyages material. A 900-word division puts the Voyages into two segments (3 and 4) that do not include non-voyage material. The dendrogram that results fr om performing cluster analysis on the 900- word segments of the scrubbed Old English text is shown in Figure 17. There are fi fty -fi ve 900-word segments. 1 2 5 6 52 7 8 29 50 51 46 24 25 55 53 54 12 28 13 22 32 45 36 39 27 43 44 35 33 34 38 40 42 37 18 31 9 19 14 41 17 20 47 48 23 15 10 11 21 49 16 26 30 3 4 α β γ η ζ δ κ λ θ ι ν μ ο ξ π ρ ε Figure 17. Dendrogram of the Old English Orosius cut into 900-word segments 42 Because we calculate all distances using relative fr equencies, it is not essential for the absolute sizes of each segment to be identical. However, we have found it important to avoid extreme diff erences in segment size because very large diff erences have the potential to produce artifacts in the dendrograms. Lexomic analysis of Anglo-Saxon prose 39 SELIM 19 (2012) When faced with as large and complex a dendrogram as Figure 17 (a situation more likely in the analysis of prose texts than of shorter poems), it is useful to bundle together the terminal leaves of many clades in order to see more clearly the high-level clade structure. Like Figure 16 above, Figure 18 borrows a convention fr om linguistics and represents large clades with triangles. The high-level clade structure of the dendrogram is thus seen to be relatively simple. There is a very signifi cant divide between clade α (which contains only segments 1, 2, 5 and 6) and the much- larger β, which includes all the rest of the Orosius translation. There are then four major divisions in β: single-leafed γ, and the bifolious clades ε and η are distinct fr om θ, which contains 46 segments. 1-2, 5-6 52 3 74 8 9-51, 53 -55 γ α β ε η δ ζ θ Figure 18. Simplifi ed dendrogram of the Old English Orosius cut into 900-word segments P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 40SELIM 19 (2012) Se gm en t 1 2 3 4 5 6 7 8 9 O ro siu s C on te nt by B oo k an d ch ap te r I.i I.i i I.i ii I.v I.v i I.v ii I.v iii I.i x I.x I.x i I.x ii I.i v Se gm en t 11 12 13 14 15 16 17 18 19 O ro siu s C on te nt by B oo k an d ch ap te r I.x ii (c on t.) II .v I.v ii I.v i I.v iii II I.i II I.i i I.xiii I.xiv II .i II .ii II .ii i II .iv III.iii Se gm en t 20 21 22 23 24 25 26 27 28 O ro siu s C on te nt by B oo k an d ch ap te r II I.i ii II I.i x II I.x II I.x i III.iv III.v II I.v i II I.v ii II I.v iii Se gm en t 29 30 31 32 33 34 35 36 37 O ro siu s C on te nt by B oo k an d ch ap te r II I.x i (c on t.) IV .v i IV .i IV .ii IV .v IV .ii i IV .iv IV .v ii IV .v ii IV .ix IV .x Se gm en t 38 39 40 41 42 43 44 45 46 O ro siu s C on te nt by B oo k an d ch ap te r IV .x (c on t.) V .ii i IV .x i V .i V .ii IV .x i IV .x iii V .ix V .x i V .x ii V .iv V .v V .v ii V .v iii V .x Se gm en t 47 48 49 50 51 52 53 54 55 O ro siu s C on te nt by B oo k an d ch ap te r V .x iii (c on t.) V I.i i V I.i ii V I.x xx i V I.x xx iv V I.x xx vi i V I.x xx V I.x xx iii VI.vi V .x ix V .x v V I.i V I.i x V I.v VI.xii VI.xiii VI.xiv VI.xv VI.xxxv VI.xxxvi VI.xxxviii 10 VI.vii VI.viii VI.ix VI.x VI.xi VI.xvi VI.xvii VI.xviii VI.xix VI.xx VI.xxi VI.xxii VI.xxiii VI.xxiv VI.xxv VI.xxvi VI.xxvii VI.xxviii VI.xxix VI.xxxii V.vi V.xiii Figure 19. Ribbon diagram of the Old English Orosius shows the organization of the text and the relationship of that organization to the segments of the dendrogram Lexomic analysis of Anglo-Saxon prose 41 SELIM 19 (2012) The ribbon diagram in Figure 19 can be used to correlate the placement of each segment in the dendrogram with its content.43 The top row gives the segment number, the bottom row the book and chapter in the Old English Orosius. The core of the dendrogram, clade θ in Figures 17 and 18, represents a signifi cant quantity of material—over 41,000 words—translated fr om the Latin text of Orosius’s History. The short vertical distances between sub-clades indicates that the material in this large grouping is relatively homogenous (though there are some diff erences, to which we will return). As is discussed in much more detail below, clades α, ε and η, which are separate fr om θ, all have diff erent sources than the main body of the text. Most signifi cant for our present purposes is clade ε, which contains the Voyages (pages 13–18 in Bately’s edition). Originally written in a vernacular and thus not translated fr om Latin, the Voyages have long been noted to be linguistically diff erent fr om the rest of the Old English Orosius (Bately 1980: lxxii). They also diff er fr om each other. Ohthere’s account is that of a Scandinavian visiting England, whereas Wulfstan’s is of an Anglo-Saxon who had traveled to Scandinavia (Townend 2002: 90–95). Although the degree of infl uence of Old Norse upon Ohthere’s account is disputed (Townend 2002: 95–101), there is no doubt among scholars that the Voyages were composed in Old English. It is therefore signifi cant that they are so distinctly separated fr om the main body of the dendrogram. As in our lexomic analysis of the Old English Penitential, we are able to detect sections of a text that have signifi cantly diff erent sources than those of the main body of the text. Even at this relatively crude level of analysis, therefore, we have taken a signifi cant step towards establishing the accuracy of lexomic methods for Old English prose, since the placement of segments 3 and 4 indicates that these have a diff erent source fr om the other segments, which they do. 43 For a much more detailed breakdown of the contents of each segment, see Appendix A. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 42SELIM 19 (2012) There are, however, additional separated clades in the dendrogram that do not contain material fr om the Voyages and therefore require further analysis. Clade α contains segments 1, 2, 5 and 6 of Or (Book I, chapters i–iii, with the exception of the Voyages). The sources of this geographic material are unknown and disputed. Some of the geographic information may have been drawn fr om a mappa mundi (Derolez 1971),44 and other elements appear to come fr om the translator’s general knowledge of continental Europe in the ninth century (Bately 1980: lxvii–lxx). But regardless of where the material came fr om, it is certain that it is not drawn fr om the Latin text of Orosius’s history. Thus clade α, like clade ε, also has a diff erent source than the main body of the text in clade θ, and this diff erence is refl ected in its placement in the dendrogram. Clade η likewise has additional sources beyond OH. This bifolious clade is comprised of segments 7 and 8, which contain the last third of chapter iii and chapters iv–viii of Book I. As Bately demonstrates in her commentary, the material in this section is heavily modifi ed and augmented fr om the Orosius’s original. For instance, Bately notes that at the end of chapter iii, a comment derived fr om Josephus (by way of Hegesippus) has been interpolated into the text. Although the comment is found in various manuscripts of OH, it is absent fr om those that are closest to the deduced source of the Old English Orosius. Its inclusion, therefore, suggests that the translator used an additional source here, perhaps Isidore’s Etymologies (Bately 1980: 212–213).45 According to Bately’s commentary, segments 7 and 8 contain 16–18 places where Or contains additional material 44 For caveats see Bately 1980: lxvii–lxx, who does not rule out the use of one or more more mappae mundi but notes that the evidence of the text is “sadly inconclusive.” 45 Bately notes that the comment could have been derived fr om Augustine, Tertullian or Tacitus, and that the version closest in wording to the Old English text is Bede’s De locis sanctis. The Fontes database identifi es Bede as the most likely source. Lexomic analysis of Anglo-Saxon prose 43 SELIM 19 (2012) not found in Orosius’s Latin text.46 In comparison, Bately only identifi es fi ve unambiguous and two possible additions in segment 9, and these are shorter than those in segments 6 and 7. Segment 1 2 3 5 6 7 8 9 10 Orosius content by book and chapter I.i I.ii I.i ii I.v I.v i I.vii I.viii I.i x I.x I.x i I.x ii I.i v Sources suggested by Fontes Anglo-Saxonici 4 Figure 20: Ribbon diagram of the identifi ed sources (based on the Fontes Anglo-Saxonici database) of segments 1–10 of the Old English translation of Orosius’s Historia. Segments are 900 words long. Note the lack of Latin sources for segments 3 and 4 The Fontes Anglo-Saxonici database adds somewhat to this total: of the 165 lines in segments 6 and 7, Fontes identifi es 72 of them as having sources in addition to Orosius, and approximately 20 of these lines are defi nitely not fr om that source. Of the 87 lines of segment 9, the Fontes database identifi es up to 35 as possibly having a source outside of Orosius, but none of these defi nitely has an outside source. Figure 20, which represents the information in the Fontes database, shows that indeed there are more potential sources in segments 7 and 8 than in segments 9 and higher. But close inspection of the citations in the Fontes database suggests that we must be somewhat cautious here: the database lists all the possible sources for a given line but oft en does not indicate which is the proximate source for the Old English translator because many of the citations are to the use of ideas rather than to phrasing fr om any specifi c text. Although the translator might have been consulting a fl orilegium, a well-glossed commentary or manuscripts in a well-stocked library, he may also have drawn on his own general knowledge and prior reading. For example, Bately 46 There are 16 notes in which Bately 1980: 212–218 identifi es defi nite additions and two other places in which she suspects an addition. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 44SELIM 19 (2012) and the Fontes database identify Genesis 41:29 as the source of a substantial passage in Book I, chapter v that describes Joseph’s prediction of the seven fat years (Bately’s lines 23.19–24.15), a passage not found in OH (Bately 1980: 213). Certainly the ultimate source for the passage is the Bible, but it seems likely that here the translator is merely drawing on his memory of the story than any specifi c intermediate source, since the Old English does not translate the biblical text word for word. This passage therefore does have a diff erent source fr om that of the nearby material that translates Orosius’s Latin, but we cannot be certain which text was its proximate source. Many of the other identifi cations of sources are likewise diffi cult to link to a physical text. But while segments 7 and 8 appear to have more defi nite sources than most of the other, later segments in clade θ, the density of material fr om non- Orosian sources is not nearly as pronounced in this clade as it is in ε (the Voyages) or α (the geographic material). And indeed, although η does separate fr om θ, it is the closest of all the outliers to that very large grouping of segments and so most similar to the main body of the text that is translated for the most part directly fr om Orosius’s Latin. The remaining anomaly in the dendrogram is segment 52, which in vocabulary distribution is less distant fr om the main body of the text than the geographic material, but, surprisingly, more so than the Voyages. This segment contains Book VI, chapters xxiiii–xxix and half of chapter xxx. Diff erences between this segment and those before and aft er it are not readily obvious. At this point in Book VI there are a series of short chapters that have the eff ect of repeating the opening words “Æft er þæm þe Romeburg getimbred wæs __ wintrum” [aft er the time in which Rome had been established for __ years] more fr equently here than in many other segments, but not particularly more so than in 51 and 53. The ribbon diagram in Figure 21 shows that segment 52 has very few identifi ed sources (Fontes and Bately propose oblique infl uence by Jerome and Isidore, but this identifi cation Lexomic analysis of Anglo-Saxon prose 45 SELIM 19 (2012) is tentative and there are no obvious quotations). Bately’s note on the end of chapter xxviii speculates that there was corruption in the underlying manuscript at this point in the text, and it seems plausible that a damaged or defective text could infl uence a dendrogram, but in this particular case only a single sentence appears to have been aff ected directly by the corruption.47 We are therefore left without a good explanation for the placement of segment 52. Either it has a source or author diff erent fr om the main body of the text but not identifi ed by Bately or Fontes, or its very lack of additional external sources makes it distinctive in vocabulary (although segment 22 similarly has few or no known sources beyond that of the Latin Orosius). Segment 47 48 49 50 51 52 53 54 55 Orosius content by book and chapter V.xiii (cont.) VI.ii VI.iii V I.x xx i V I.x xx iv VI.xxxviiVI.xxx V I.x xx iii V.xix V.xv VI.i VI.ix VI.v V I.x xx v V I.x xx vi V I.x xx vi ii Sources suggested by Fontes Anglo- Saxonici V I.x xx ii V I.x xi x V I.x xv iii V I.x xv ii V I.x xv i V I.x xv V I.x xx iv V I.x xi ii V I.x xi i V I.x xi V I.x x V I.x ix V I.x vi ii V I.x vi i V I.x vi V I.x v V I.x iv V I.x iii V I.x ii V I.x i V I.x V I.i x V I.v iii V I.v ii V I.v i Figure 21: Ribbon diagram of the identifi ed sources (based on the Fontes Anglo-Saxonici database) of segments 47–55 of the Old English translation of Orosius’s Historia. Segments are 900 words long. Note that 52 is the only segment generally lacking in known sources Segment 52 therefore is at this point an unexplained anomaly. As such, it could cast some doubt on the applicability of lexomic 47 It may be that corruption in the exemplar aff ected the prose styly by forcing the translator to compose rather than translate and that the resultant diff erence in the distribution of vocabulary is infl uencing the dendrogram geometry, but at this stage of our knowledge we do not have enough evidence or understanding to confi rm or rule out this possibility. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 46SELIM 19 (2012) methods to prose texts,48 but in all other cases the high-level of the geometry has refl ected the source structure of the texts. lexomic methods were able to detect the diff erences in vocabulary distribution between the section of the Old English Penitential that has an Anglo-Saxon source and the rest of the text, which is based on the Latin penitential of Haltigar, and they were likewise able to identify through dendrogram geometry alone the infl uence of diff erent sources in the Voyages of Ohthere and Wulfstan, the geographic material fr om an unknown source, and the additions to segments 7 and 8 of the Orosius translation. We can therefore have some reasonable confi dence in the accuracy of the methods when extended fr om poetry to prose, particularly when we remember that we are using lexomics to open up a complimentary information channel about texts, not to replace traditional methods. It is when we correlate traditional methods with lexomic analysis, using each to augment the other, that we gain new insight into the texts, and it is hoped that future scholars, now alerted that something may be unusual in segment 52, may be able to discover an explanation. 5 The Deeper Structure of the Orosius Translation To this point our paper has mostly developed controls to which future lexomic research into prose text can be compared. It is hard to overstate the importance of such controls in a historical discipline like ours: we can only have confi dence in the techniques if we can compare the results arrived at by their employment with knowledge acquired by other means. But while controls can show us that a methodology can produce accurate results, they do not necessarily demonstrate that the methods are useful. For this latter point we want not merely confi rmation of existing knowledge but 48 Although other seeming anomalies, such as the placement of Juliana in the Cynewulf dendrogram or a seemingly anomalous simplicifolious clade in Genesis have turned out, upon further study, to have external sources. See Drout et al. 2011: 330-335. Lexomic analysis of Anglo-Saxon prose 47 SELIM 19 (2012) unexpected additional support for more controversial hypotheses or entirely new information. Further analysis of the lower-level clade structure of the Orosius translation gives us examples of both desiderata. For the purposes of our preceding analysis we had simplifi ed the large and complex dendrogram of the entire translation, temporarily ignoring the details of the structure of 46-leafed clade θ (in Figure 17). It is now time to examine its geometry more closely. Within θ the fi rst clade to separate is κ, composed of segments 24, 25, 29, 46, 50, 51, 53, 54 and 55, and within κ segment 29 is simplicifolious, indicating that the distribution of its vocabulary is distinct fr om the rest of the material. Clade 29 contains the second half of Book III, chapter xi, in which Orosius discusses the struggles among the successors of Alexander the Great in Macedonia. The twists and turns of the plot are complex, with multiple treasons and shift s of fortune. As Bately demonstrates, the Old English translator attempted to make this section of the text clearer both by simplify ing and by adding explanations. Bately’s notes refer fr equently to the Epitome of Justinus,49 an early Roman historian who was a source for Orosius and whose writings clarify —at least for the modern reader—the Alexandrine succession. There is never a close enough verbal correspondence between Justinus and Or for Bately to be certain that the Epitome was a source for the translator, but reading Justinus side-by-side with both OH and Or does show how opaque some of Orosius’s passages are in comparison to both the Epitome and the Old English text.50 Additional circumstantial evidence for the infl uence of Justinus may be the possibility that Asser, King Alfr ed’s biographer, knew the Epitome, since Michael Lapidge has shown that Justinus cannot be ruled out as a text known to Asser 49 For the knowledge of Justinus in Anglo-Saxon England, see Crick 1987. 50 We are grateful for Joel Relihan’s assistance with this material. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 48SELIM 19 (2012) (Lapidge 2003: 27).51 Although the Orosius translation is no longer credited directly to the king, it is understood to have been part of his educational program and produced in his circle, of which Asser was an important part, arriving at approximately the same time as Grimbald and John the Old Saxon (Keynes & Lapidge 1983: 26–27). If Asser had access to a copy of Justinus—either in Wales or England—then it is not unreasonable to suppose that the translator of Or could likewise have read Justinus and therefore use the Epitome as a source for this section of his translation. The infl uence of Justinus would then explain the placement of segment 29 in the dendrogram. However, when the Old English translator deviates fr om Orosius’s text, he does not obviously translate Justinus. Instead, it almost appears as if the translator becomes fr ustrated with Orosius’s circumlocutions and rather brutally simplifi es the material. For instance, Orosius, in his depiction of the death of Lysimachus, off ers an elaborate, somewhat poetic description, which the Old English translator renders tersely as “þær wæs Lysimachus ofslagen” (Sweet 1883: 152–153; Bately 1980: 82; Seel 1972: 148). Other, similar simplifi cations are found throughout the section, suggesting that the infl uence of Justinus, if it exists at all, is somewhat oblique. The fi nal passage of Book III, also included in segment 29, comes fr om neither Orosius or Justinus: “þonne us fr emde & ellþeodge an becumaþ & lytles hwæt on us bereafi að & us eft hrædlice forlætað.” As Bately notes “There is nothing corresponding to this in OH: indeed the situation in Rome in Orosius’ day was very diff erent. It therefore seems reasonable to suppose that the translator is referring here to conditions in his own time, and to raids by the Vikings” (Bately 1980: 270 and xciv). Here we see the translator modify ing and augmenting his source based on what is presumably his own experience rather than any external text. The additional 51 However, Lapidge was not able to confi rm Asser’s knowledge of Justinus because the evidence is ambiguous. Lexomic analysis of Anglo-Saxon prose 49 SELIM 19 (2012) material here is only 18 words long, so it itself is almost certainly not the entire cause of the location of segment 29 in the dendrogram. But the obvious departure fr om the source may indicate that in this section of the text the translator was more fr eely adapting than elsewhere, either because he was fr ustrated by and wanted to clarify Orosius, or because he had a text—Justinus—that better explained the material, or some combination of the two. Given the current state of lexomic techniques and our knowledge of the text, we cannot conclude at this time that segment 29 certainly has a diff erent source (or constellation of sources) than the rest of the Orosius translation. But the correlation of the lexomic evidence with information derived fr om traditional methods of investigation gave us a reason to reexamine the evidence for changes in infl uence at this point in the text, and our subsequent scrutiny of the text has at least hinted at the translator’s practice (and perhaps his sources and identity). Further examination of the Orosius translation in light of the geometry of dendrograms, especially those composed of diff erently sized segments, may reward investigators who can correlate dendrogram geometry with previous hypotheses about structure, authorship or affi nity. For example, Elizabeth Liggins’s 1970 claim for multiple authorship of the translation—based on her analysis of the distribution of various syntactic features—was reasonably criticized by Bately for, among other reasons, lacking a “control” and for failing to take into account “the possibility of a single translator, gradually developing a style” (Bately 1980: lxxiv–lxxxi). Lexomic analysis does not on the fi rst pass appear to support Liggins’s assertions, but it may be worth noting that the interior structure of clade θ, which contains the most homogeneous section of the translation, does divide roughly into three large clusters (clades ο, π and ρ in Figure 17). In research on other texts,52 we have found that the production of multiple dendrograms at diff erent segment sizes allows us to note “robust” groupings (those which appear at multiple resolutions). 52 Drout et al. forthcoming. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 50SELIM 19 (2012) Identifi cation of such robust divisions and then further syntactic, semantic or stylistic analysis may allow researchers to revisit the claims for multiple authorship or to gain a better understanding of the translator’s practice. Although such analysis is beyond the scope of the current paper, which has been concerned to establish a baseline of knowledge about the applicability of lexomic methods to Anglo-Saxon prose, it can be performed with comparative ease now that the soft ware tools are fr eely available and now can be operated through a convenient interface. 6 Conclusions This paper set out to determine if the lexomic techniques which have been profi tably applied to Anglo-Saxon poetic texts might also be used for analysis of Anglo-Saxon prose. We conclude that with suitable modifi cation they can. Researchers must take into account not only the larger size of most prose texts but also their existence in multiple copies and recensions, which are refl ected in the complexity of the critical apparatus of most editions. Our investigation of the Old English Penitential shows that lexomic analysis based upon a critical edition is consistent with that based on a diplomatic edition, but we also note it is essential that researchers understand thoroughly an editor’s practices of collation and organization. Had we not recognized that Raith interpolated the capitulae fr om Junius 121 into a text based primarily on Laud Misc. 482, we would not have been able to devise a useful experiment and then interpret the dendrogram correctly. Combined with our comparison of the ASPR critical editions of poems to the diplomatic editions reconstructed fr om the apparatus, the evidence of the Old English Penitential dendrograms gives us some confi dence in lexomic analysis based on the critical editions in the Dictionary of Old English corpus. It is important to note, however, that diff ering editorial practices across multiple texts may complicate the task of comparing them, and while the consistent editing of Krapp and Dobbie across the entire poetic corpus allows us to make comparisons among Anglo-Saxon poems, there is no such Lexomic analysis of Anglo-Saxon prose 51 SELIM 19 (2012) consistency in the editions of much of the prose. If, for example, we wanted to perform lexomic analysis across all the penitential texts in the DOE Corpus, we might fi nd that consistent diff erences between Raith’s editorial practices and those of Finsterwalder or Mone might generate either false positives or negatives. Consolidation of thorn and eth and expansion of Tironian note will obviate some artifactual diff erences, and others can be eliminated through orthographic normalization or even lemmatization. In the end, researchers can have confi dence in lexomic analysis based on any single critical edition but must be cautious when making broader comparisons. Our analysis of both the Old English Penitential and the Old English translation of Orosius allows us to conclude that the ability of lexomic methods to detect signifi cant diff erences in the sources of texts applies to prose as well as to poetry. The dendrograms of both the penitential and Or separates material based on its sources: the fi nal book of the penitential, which is based on the Anglo- Saxon Scrift boc, is in its own clade, as are both the non-Orosian geographic material and the Voyages of Ohthere and Wulfstan. Furthermore, like material appears to be grouped with like: despite the interruption of the Voyages in segments 3 and 4, the most outlying clade in the Orosius dendrogram contains all segments of geographic material derived fr om an unknown source (segments 1, 2, 5 and 6). The dendrogram does not only separate diff erently sourced segments but groups them correctly. In addition to establishing controls, this paper has set out to demonstrate the utility of lexomic analysis in Anglo-Saxon prose texts. Our discussion of the possible infl uence of Justinus on Or shows both the promise of the methods and their challenges. In this particular case, we had no particular agenda with regard to the possible use of Justinus by the translator because we were unaware that this was an open question in the scholarship. The dendrogram is therefore reasonably objective evidence that segment 29 is subtly diff erent in vocabulary distribution fr om the material that surrounds it. In itself that diff erence is not suffi cient evidence to be P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 52SELIM 19 (2012) certain that the translator knew Justinus. Even when we combine the lexomic evidence with Bately’s very tentative hypothesis and Lapidge’s identifi cation of the Epitome as a text that might have been known to Asser, we still fi nd ourselves in speculative territory. But although the accumulation of circumstantial evidence is never dispositive, it is still valuable, and we can therefore conclude that the translator’s use of Justinus is somewhat more probable than it was before we knew of the lexomic results. Perhaps more signifi cantly, we see here that the lexomic approach can show us where to look even if it cannot always tell us what we end up fi nding there. Most investigations in our fi eld are thesis-driven: we have a hypothesis and seek evidence to support it. Lexomic analysis can certainly be used this way, but it is perhaps even more valuable when we realize that because they are broadly objective and able to be automated, lexomic methods can be used as screening mechanisms. The Orosius translation is enormous and Bately’s edition larger still. Most researchers must approach such large texts with a pre-existing thesis for which they seek supporting evidence. In such circumstances, the mind’s ability to detect large-scale, unanticipated patterns is limited. Lexomic methods, however, can screen multiple large texts to identify particular sections that might repay scrutiny. Once these segments of interest are identifi ed, scholars can employ traditional methods and then, in an “iterate and test” loop, return to lexomic approaches in order to generate additional evidence with which to test various hypotheses. Although they will never replace the erudite and creative scholar, lexomic methods do have the potential to become a signifi cant tool for better understanding the culture of the Middle Ages. Phoebe Boyd, Michael D. C. Drout, Namiko Hitotsubashi, Michael J. Kahn, Mark D. LeBlanc & Leah Smith53 Wheaton College (Mass.) 53 Corresponding author: mdrout@wheatoncollege.edu. Lexomic analysis of Anglo-Saxon prose 53 SELIM 19 (2012) Appendix A: Segment Ranges and Contents in OR Segment No. Word Range Book & Chapter Pages & lines in Bately 1 1–900 I.i–I.i ⒏11–⒒6 2 901–1800 I.i–I.i ⒒6–⒔23 3 1801–2700 I.i–I.i ⒔24–⒓2 4 2701–3600 I.i–I.i ⒗3–⒙6 5 3601–4500 I.i–I.i ⒙6–⒛20 6 4501–5400 I.i–I.ii–I.iii ⒛20–2⒊3 7 5401–6300 I.iii–I.iv, v, vi, vii. 2⒊3–2⒌24 8 6301–7200 I.vii–I.viii 2⒌24–2⒏8 9 7201–8100 I.viii–I.ix, x 2⒏8–30.31 10 8101–9000 I.x–I.xi, xii 30.31–3⒊21 11 9001–9900 I.xii, I.xiii, I.xiiii–II.I 3⒊21–3⒍12 12 9901–10800 II.i, II.ii, 3⒍12–3⒐3 13 10801–11700 II.iii–II.iiii 3⒐3–4⒈23 14 11701–12600 II.iiii 4⒈23–4⒋9 15 12601–13500 II.iiii–II.v 4⒋9–4⒍23 16 13501–14400 II.v 4⒍23–4⒏35 17 14401–15300 II.v, II.vi, II.vii, II.viii 4⒏35–5⒈19 18 15301–16200 II.viii–III.I 5⒈19–5⒋4 19 16201–17100 III.i, III.ii, III.iii 5⒋4–5⒍24 20 17100–18000 III.iii, III.iv, III.v 5⒍24–5⒐24 21 18001–18900 III.v, III.vi, III.vii 5⒐24–6⒉14 22 18901–19800 III.vii–III.viii 6⒉14–6⒋29 23 19801–20700 III.viii 6⒋29–6⒎15 24 20701–21600 III.viii, III.viiii 6⒎15–70.1 25 21601–22500 III.viiii 70.1–7⒉25 26 22501–23400 III.viiii–III.x 7⒉25–7⒌11 27 23401–24300 III.x–III.xi 7⒌11–7⒏1 28 24301–25200 III.xi 7⒏1–80.22 29 25201–26100 III.xi–IV.i 80.22–8⒊10 30 26101–27000 IV.i 8⒊10–8⒌27 31 27001–27900 IV.i, IV.ii, IV.iii, IV.iiii 8⒌27–8⒏16 32 27901–28800 IV.iii–IV.v 8⒏16–9⒈6 33 28801–29700 IV.v–IV.vi 9⒈6–9⒊29 34 29701–30600 IV.vi 9⒊29–9⒍14 35 30601–31500 IV.vi–IV.vii 9⒍14–9⒐1 36 31501–32400 IV.vii, IV.viii, IV.ix 9⒐2–10⒈19 P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 54SELIM 19 (2012) 37 32401–33300 IV.ix–IV.x 10⒈19–10⒋7 38 33301–34200 IV.x 10⒋7–10⒍25 39 34201–35100 Iv.x–IV.xi 10⒍25–10⒐10 40 35101–36000 IV.xi, IV.xii, IV.xiii 10⒐10–1⒓3 41 36001–36900 IV.xiii, V.i, V.ii 1⒓3–1⒕24 42 36901–37800 V.ii, V.iii 1⒕24–1⒘18 43 37801–38700 V.iii, V.iiii, V.v, V.vi, V.vii 1⒘18–1⒛20 44 38701–39600 V.vii, V.viii, V.ix, V.x 1⒛20–12⒊16 45 39601–40500 V.x., V.xi, V.xii 12⒊16–12⒍14 46 40501–41400 V.xii, V.xiii 12⒍14–12⒐6 47 41401–42300 V.xiii, V.xiiii, V.xv 12⒐6–13⒉2 48 42301–43200 V.xv, VI.i, V.ii 13⒉2–13⒋29 49 43201–44100 V.ii, VI.iii, VI.v 13⒋29–13⒎23 50 44101–45000 VI.v, VI.vi, VI.vii, VI.viii., VI.viiii, VI.x, VI.xi, VI.xii, VI.xiii 13⒎23–14⒈7 51 45001–45900 VI.xiii, VI.xiiii, VI.xv, VI.xvi, VI.xvii, VI.xviii, VI.xviiii, VI.xx, VI.xxi, VI.xxii, VI.xxiii 14⒈7–14⒋18 52 45901–46800 VI.xxiii, VI.xxiiii, VI.xxv, VI.xxvi, VI.xxvii, VI.xxviii, VI.xxviiii, VI.xxx 14⒋18–14⒏7 53 46801–47700 VI.xxx, VI.xxxii 14⒏7–15⒈6 54 47701–48600 VI.xxxii, VI.xxxiii, VI.xxxiiii, VI.xxxv, VI.xxxvi 15⒈6–15⒋4 55 48601–49452 VI.xxxvi, VI.xxxvii, VI.xxxviii 15⒋4–15⒍23 References Bately, J. 1970: King Alfr ed and the Old English Translation of Orosius. Anglia 88: 433–460. Bately, J. 1971: The Classical Editions in the Old English Orosius. In P. Clemoes & K. Hughes eds. England Before the Conquest. Cambridge, Cambridge University Press: 237–251. Bately, J. ed. 1980: The Old English Orosius (E.E.T.S. S.S. 6). London, Oxford University Press. Lexomic analysis of Anglo-Saxon prose 55 SELIM 19 (2012) Burrows, J. F. 2003: Questions of Authorship: Attribution and Beyond. Computers and the Humanities 37: 5–32. Cameron, A. & R. Frank 1973: A Plan for the Dictionary of Old English. Toronto, University of Toronto Press. Campbell, J. 1959: Old English Grammar. Oxford, Clarendon Press. Cerquiglini, B. 1999: In Praise of the Variant: A Critical History of Philology. [Betsy Wing trans. 1989: Éloge de la variante]. Baltimore, Johns Hopkins University Press. Chauvet, E. & M. D. C. Drout forthcoming: Visual Representation of the Ratio of þ to þ+ð: A New Tool for the Investigation of Old English Textual History. Crick, J. 1987: An Anglo-Saxon fr agment of Justinus’ Epitome. Anglo- Saxon England 16: 181–196. Derolez, R. 1971: The orientation system in the Old English Orosius. In P. Clemoes & K. Hughes eds. English Before the Conquest. Cambridge, Cambridge University Press: 253–268. Downey, S., M. D. C. Drout, M. Kahn & M. LeBlanc 2012: ‘Books Tell Us’: Lexomic and Traditional Evidence for the Sources of Guthlac A. Modern Philology 110: 1–29. Downey, S., M. D. C. Drout, V. Kerekes & D. Raff el [forthcoming]: Lexomic Analysis of Medieval Latin Texts. Drout, M. D. C. 2013: Tradition and Infl uence in Anglo-Saxon Literature: An Evolutionary, Cognitivist Approach. New York, Palgrave Macmillan. Drout M. D. C. & S. Kleinman 2010: Philological Inquiries 2: Something Old, Something New: Material Philology and the Recovery of the Past. The Heroic Age 13. http://www.mun.ca/mst/heroicage/ issues/13/pi.php (accessed 2 March 2013). Drout, M. D. C., M. Kahn, M. LeBlanc & C. Nelson 2011: Of Dendrogrammatology: Lexomic Methods for Analyzing the Relationships Among Old English Poems. Journal of English and Germanic Philology 110: 301–336. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 56SELIM 19 (2012) Drout, M. D. C., Y. Kisor, A. Dennett, N. Piirainen & L. Smith forthcoming: Lexomic Analysis of Beowulf. Dyer, B. 2002: Genome Technology 1.27. http://www.genomeweb.com/ blunt-end-0. (accessed 1 November 2002). Frantzen, A. J. 1983: The Literature of Penance in Anglo-Saxon England. New Brunswick (NJ ), Rutgers University Press. Frantzen, A. J. 2013: The Anglo-Saxon Penitentials: A Cultural Database. http://www.anglo-saxon.net/penance (accessed 2 March 2013). Gneuss, H. 2001: Handlist of Anglo-Saxon Manuscripts. Tempe (AZ), Arizona Medieval and Renaissance Texts and Studies. Hennig, W. 1966: Phylogenetic Systematics [D. D. Davis & R. Zangerl trans. 1950: Grundz üge einer Theorie der phylogenetischen Systematik]. Urbana, University of Illinois Press. Hoover, D. L. 2004: Testing Burrows’s Delta. Literary and Linguistic Computing 19.4: 453–475. Ker, N. R. 1957: Catalogue of Manuscripts Containing Anglo-Saxon. Oxford, Clarendon Press. Keynes, S. & M. Lapidge 1983: Alfr ed the Great: Asser’s Life of King Alfr ed and Other Contemporary Sources. London, Penguin. Lapidge, M. 2003: Asser’s Reading. In T. Reuter ed. Alfr ed the Great. London, Ashgate. Liggins, E. 1970: The Authorship of the Old English Orosius. Anglia 88: 289–322. Mardia, K, J. Kent & J. Bibby 1980: Multivariate Analysis. London, Academic Press. Megginson, D. 1993: The Written Language of Old English Poetry. (PhD Dissertation). Toronto, University of Toronto. Millett, B. 2008: What is mouvance? http://www.soton.ac.uk/~wpwt/ mouvance/mouvance.htm (accessed 12 Dec 2012). Lexomic analysis of Anglo-Saxon prose 57 SELIM 19 (2012) O’Brien O’Keeff e, K. 1990: Visible Song: Transitional Literacy in Old English Verse. Cambridge, Cambridge University Press. O’Donnell, D. P. 2005: Cædmon’s Hymn: A Multimedia Study, Edition and Archive. Woodbridge, D. S. Brewer. R Development Core Team 2009: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org (accessed 2 March 2013). Raith, J. ed. 1964 [1933]: Die altenglische Version des Halitgar’schen Bussbuches (sog. Poenitentiale Pseudo-Ecgberti). Darmstadt, Wissenschaft liche Buchgesellschaft . Raith, J. 1951: Untersuchungen zum englischen Aspekt, I. Grundsätzliches Altenglisch. Munich, Heuber. Roberts, J. 2006: Guide to Scripts Used in English Writings up to 1500. London, British Library. Schmitz, H. J. ed. 1958 [1898]: Die Bussbücher und das kanonische Bussverfahren. Graz, Akademische Druck U. Verlaganstalt. Schröer, A. ed. 1964 [1885]: Die Angelsächsischen Prosabearbeitungen der Benediktinerregel. Darmstadt Wissenschaft liche Buchgesellschaft . Seel, O. 1972: M. Iuniani Iustini epitoma Historiarum Philippicarum Pompei Trogi. Stuttgart, B. G. Teubner. Shippey, T. 2007: Fighting the Long Defeat: Philology in Tolkien’s Life and Works. Roots and Branches: Selected Papers on Tolkien by Tom Shippey. Jena, Walking Tree Publishers. Shippey, T. 2008: Response to three papers on ‘Philology: Whence and Whither?’ given by Drs Utz, Macgillivray, and Zolkowski, at Kalamazoo, 4th May 2002. The Heroic Age 11: http://www.mun. ca/mst/heroicage/issues/11/foruma.php (accessed 2 March 2013). Spindler, R. ed. 1934: Das altenglische Bussbuch (sog. Confessionale Pseudo- Egberti). Leipzig, Tauchnitz. Stokes, P. 2009: The Digital Dictionary. Florilegium 26: 37–65. P. Boyd, M. D. C. Drout, N. Hitotsubashi, M. J. Kahn, M. LeBlanc & L. Smith 58SELIM 19 (2012) Stubbs, W. ed. 1887: William of Malmesbury De Gestis Regis Anglorum (Rolls Series 90). London, Longman. Sweet, H. 1883: King Alfr ed’s Orosius: Part I: Old English Text and Latin Original (E.E.T.S. O.S. 79). London, Trübner. Thorpe, B. 1840: Ancient Laws and Institutes of England. 2 vols. London, G. E. Eyre and A. Spottiswoode. Townend, M. 2002: Language and History in Viking Age England: Linguistic Relations between Speakers of Old Norse and Old English. Brepols, Turnhout. Valtonen, I. 2008: The North in the Old English Orosius. A Geographical Narrative in Context. Helsinki, Société Néophilologique. Zumthor, P. 1972: Essai de poétique médiévale. Paris, Seuil. Zumthor, P. 1987: La lettre et la voix: de la ‘littérature’ médiévale. Paris, Seuil. • Received 16 Apr 2013; accepted 14 Sep 2013