The easiest to use web text analysis tool. Voyant is free and allows users to upload or paste text. The program will automatically determine word frequencies and colocates and display them graphically.
MALLET (MAchine Learning for LanguagE Toolkit) is a collection of tools that facilitate document classification, sequence tagging, and topic modeling. There is also an add-on toolkit (Graphical Models in MALLET) for visualization.
The Stanford NLP Group makes some of our Natural Language Processing software available to everyone! We provide statistical NLP, deep learning NLP, and rule-based NLP tools for major computational linguistics problems, which can be incorporated into applications with human language technology needs. These packages are widely used in industry, academia, and government.
This collection of text analysis tools hosted by the University of Alberta provides XML, HTML, and plain text analysis. Upload documents to extract common words, determine colocates, seperate HTML tags, and extract XML tagged information.
WordSeer is a collection of text analysis tools targeted at humanities scholars that includes side-by-side comparison, grammatical search, and document/sentence/word-set features.
Data for Research is a free data mining tool for journal content on JSTOR, available to the public. DfR provides the ability to obtain data sets via bulk downloads, and includes a powerful faceted search interface, online viewing of document-level data, downloadable datasets (including word frequencies, citations, key terms, and ngrams).
This is the classic interface designed by Google which allows users to plot single words and short phrases over time in a large subset (~5 million books) of the corpus.
Developed by the Culturomics folks at Harvard it limits itself to only those digitized texts which have information about them (Full title, Publication Date, Publication Place, etc.) on OpenLibrary.org. As a resuly users can run queries in highly selective corpora based on subject (books on world history, American books on science, etc.) though these corpora are much smaller than those in the full Google Books collection.
This interface is the only of the above that allows users to search longer strings of words from the corpus. Offers the same corpora as available in N-Grams including American works (155 billion words) British works (34 billion words) Fiction (91 billion words) Spanish works (45 billion words), and a 1,000,000 book sample (89 billion words).
Collection of more than 5,000 texts, more than 2,000 of which have been marked up and keyed in by hand. Includes a large number of early English texts from the ECCO-TCP collection as well as all of Shakespeare and other works.
Monk at the University of Illinois provides access to the full text of 525 works of pre-1900 American Litertaure as well as many of the works of William Shakespeare.