Reading at Scale. Mixing Methods in Literary Corpus Analysis

conducted by: Thomas Weitin and Ulrik Brandes
funded by the Volkswagen Foundation

‚Reading at Scale‘ is based on the following ansatz: If hermeneutical methods are appropriate for analysis of rich detail and if statistical methods are appropriate for large-scale data analysis, a mix of these methods should be better suited for analyses on intermediate scales than approaches from either of the two schools. Literary corpora naturally lend themselves to hierarchical aggregation with many intermediate stages from the level of characters to entire literatures. Consequently, many arising research questions are associated with representations on such intermediate scales. The focal object of our study is a historical collection of 86 novellas published by the German authors Paul Heyse and Hermann Kurz under the title „Der deutsche Novellenschatz“ (24 volumes, 1871 — 1876). We have prepared this realism-oriented collection in a TEI XML corpus, and more such collections are to follow. Its medium size conveniently marks the boundary of what is still in reach for a single determined reader and yet sufficiently large for statistical analysis. Two dissertations will study this text sample on distinct levels of operationalization: (1) Stylometric corpus analysis will tackle the problem of realistic style; (2) positional network analysis addresses the problem of distinction in a popular literature; (3) a comparative study explores the “Deutsche Novellenschatz” as a powerful instrument of canonization and an attempt of a non-narrative literary history. The PIs address basic research problems to integrate these individual studies: An algorithmic subproject is concerned with positional concepts in network science and a literary studies subproject is focused on validating digital analyses.