It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Stopwords are words of high frequency but low meaning (such as function words, like "a," "an," "of," "the," etc.) that can hinder some text analyses (stylometry is a key exception, as it analyzes these words). Stopword lists tell the text analysis software the words to ignore.
Stemmers + Lemmatizers reduce inflected words (ex. "thinks," "thinking," "thinker," etc.) to their root (ex. "think"), which can be helpful in text analysis. Lemmatizers attempt to account for a word's context and part of speech (i.e. whether "saw" is a noun or verb) but can be complex and run slowly; Stemmers do not account for context and POS but tend to be simple and fast.
Distant Horizons by Ted UnderwoodJust as a traveler crossing a continent won't sense the curvature of the earth, one lifetime of reading can't grasp the largest patterns organizing literary history. This is the guiding premise behind Distant Horizons, which uses the scope of data newly available to us through digital libraries to tackle previously elusive questions about literature. Ted Underwood shows how digital archives and statistical tools, rather than reducing words to numbers (as is often feared), can deepen our understanding of issues that have always been central to humanistic inquiry. Without denying the usefulness of time-honored approaches like close reading, narratology, or genre studies, Underwood argues that we also need to read the larger arcs of literary change that have remained hidden from us by their sheer scale. Using both close and distant reading to trace the differentiation of genres, transformation of gender roles, and surprising persistence of aesthetic judgment, Underwood shows how digital methods can bring into focus the larger landscape of literary history and add to the beauty and complexity we value in literature.
Publication Date: 2019-02-14
Distant Reading by Franco MorettiHow does a literary historian end up thinking in terms of z-scores, principal component analysis, and clustering coefficients? The essays in Distant Reading led to a new and often contested paradigm of literary analysis. In presenting them here Franco Moretti reconstructs his intellectual trajectory, the theoretical influences over his work, and explores the polemics that have often developed around his positions. From the evolutionary model of Modern European Literature, through the geo-cultural insights of Conjectures of World Literature and Planet Hollywood, to the quantitative findings of Style, inc. and the abstract patterns of Network Theory, Plot Analysis, the book follows two decades of conceptual development, organizing them around the metaphor of distant reading, that has come to define well beyond the wildest expectations of its author a growing field of unorthodox literary studies.
Publication Date: 2013-06-04
Enumerations by Andrew PiperFor well over a century, academic disciplines have studied human behavior using quantitative information. Until recently, however, the humanities have remained largely immune to the use of data--or vigorously resisted it. Thanks to new developments in computer science and natural language processing, literary scholars have embraced the quantitative study of literary works and have helped make Digital Humanities a rapidly growing field. But these developments raise a fundamental, and as yet unanswered question: what is the meaning of literary quantity? In Enumerations, Andrew Piper answers that question across a variety of domains fundamental to the study of literature. He focuses on the elementary particles of literature, from the role of punctuation in poetry, the matter of plot in novels, the study of topoi, and the behavior of characters, to the nature of fictional language and the shape of a poet's career. How does quantity affect our understanding of these categories? What happens when we look at 3,388,230 punctuation marks, 1.4 billion words, or 650,000 fictional characters? Does this change how we think about poetry, the novel, fictionality, character, the commonplace, or the writer's career? In the course of answering such questions, Piper introduces readers to the analytical building blocks of computational text analysis and brings them to bear on fundamental concerns of literary scholarship. This book will be essential reading for anyone interested in Digital Humanities and the future of literary study.
Publication Date: 2018-08-29
Macroanalysis by Matthew L. JockersIn this volume, Matthew L. Jockers introduces readers to large-scale literary computing and the revolutionary potential of macroanalysis--a new approach to the study of the literary record designed for probing the digital-textual world as it exists today, in digital form and in large quantities. Using computational analysis to retrieve key words, phrases, and linguistic patterns across thousands of texts in digital libraries, researchers can draw conclusions based on quantifiable evidence regarding how literary trends are employed over time, across periods, within regions, or within demographic groups, as well as how cultural, historical, and societal linkages may bind individual authors, texts, and genres into an aggregate literary culture. Moving beyond the limitations of literary interpretation based on the "close-reading" of individual works, Jockers describes how this new method of studying large collections of digital material can help us to better understand and contextualize the individual works within those collections.