CTA techniques include:
Keyword Analysis: keywords and key phrases can be identified or tracked in a text or corpus through various computational means.
Named Entity Recognition (NER): NER extracts and categorizes a text's or corpus's proper nouns and other information types.
Sentiment Analysis: sentiment analysis quantitatively determines affective trends in a document or corpus.
Stylometry: Stylometry is the use of quantitative and statistical methods to determine literary style.
Topic Modeling: Topic modeling determines the thematic composition--the aboutness--of a document or documents in a corpus.
Word Embedding Modeling: Word embedding determines the aboutness of words in a document or collection by computing which words tend to be associated.
The Programming Historian: free, peer-reviewed digital humanities tutorials (here linked to their tutorials on CTA)
The Fish and the Painting: Andrew Piper's online textbook on how to use R for humanities text analysis
Hacking the Humanities Tutorials: Paul Vierthaler's YouTube tutorials on how to use Python for humanities text analysis
Tutorials also accompany a number of tools listed above.
Free CTA tools include:
Easy
AntConc: downloadable tool mainly for keyword analyses of a text or corpus
Voyant: browser-based tool mainly for keyword analyses of a text or corpus
Topic Modeling Tool: downloadable tool for topic modeling
Moderate
Lexos: browser-based tool mainly for stylometry
(MAchine Learning for LanguagE Toolkit): command-line software mainly for topic modeling
Difficult
general-purpose programming language often used for text analysis
statistics-oriented programing language often used for text analysis
Tool-Corpus Sets
English-corpora.org: enables keyword analyses of a variety of large corpora
HathiTrust Research Center (Free to Union): set of affordances for analyzing the HathiTrust collection
Stopword Lists (see also NLTK Data)
Stopwords are words of high frequency but low meaning (such as function words, like "a," "an," "of," "the," etc.) that can hinder some text analyses (stylometry is a key exception, as it analyzes these words). Stopword lists tell the text analysis software the words to ignore.
Stemmers / Lemmatizers
Stemmers + Lemmatizers reduce inflected words (ex. "thinks," "thinking," "thinker," etc.) to their root (ex. "think"), which can be helpful in text analysis. Lemmatizers attempt to account for a word's context and part of speech (i.e. whether "saw" is a noun or verb) but can be complex and run slowly; Stemmers do not account for context and POS but tend to be simple and fast.
Free text data repositories include:
DH Resources for Project Building--Data Collections and Datasets: aggregates repositories of text data
DocNow: Twitter datasets
English Corpora: various text datasets
JSTOR Data for Research: 12+ million secondary and primary source texts
NLTK Data: Various datasets from text collections, to stopword lists, to sentiment lexicons, etc.
Project Gutenberg: 60,000+ books, with focus on older, public domain works
Schaffer Library's Databases ("Free" to Union): access to a variety of digitized texts. A number of these offer tools to analyze their text collections.
CTA Projects include:
Hendometer: sentiment analysis that seeks to measure happiness in a variety of corpora
Viral Texts: traces text reuse in 19C American newspapers
What Every1 Says: applies topic modeling to trace how the humanities is covered in the news