The easiest to use web text analysis tool. Voyant is free and allows users to upload or paste text. The program will automatically determine word frequencies and colocates and display them graphically.
MALLET (MAchine Learning for LanguagE Toolkit) is a collection of tools that facilitate document classification, sequence tagging, and topic modeling. There is also an add-on toolkit (Graphical Models in MALLET) for visualization.
The Stanford NLP Group makes some of our Natural Language Processing software available to everyone! We provide statistical NLP, deep learning NLP, and rule-based NLP tools for major computational linguistics problems, which can be incorporated into applications with human language technology needs. These packages are widely used in industry, academia, and government.
This collection of text analysis tools hosted by the University of Alberta provides XML, HTML, and plain text analysis. Upload documents to extract common words, determine colocates, seperate HTML tags, and extract XML tagged information.
Data for Research is a free data mining tool for journal content on JSTOR, available to the public. DfR provides the ability to obtain data sets via bulk downloads, and includes a powerful faceted search interface, online viewing of document-level data, downloadable datasets (including word frequencies, citations, key terms, and ngrams).
Developed by the Culturomics folks at Harvard it limits itself to only those digitized texts which have information about them (Full title, Publication Date, Publication Place, etc.) on OpenLibrary.org. As a resuly users can run queries in highly selective corpora based on subject (books on world history, American books on science, etc.) though these corpora are much smaller than those in the full Google Books collection.
This interface is the only of the above that allows users to search longer strings of words from the corpus. Offers the same corpora as available in N-Grams including American works (155 billion words) British works (34 billion words) Fiction (91 billion words) Spanish works (45 billion words), and a 1,000,000 book sample (89 billion words).
Collection of more than 5,000 texts, more than 2,000 of which have been marked up and keyed in by hand. Includes a large number of early English texts from the ECCO-TCP collection as well as all of Shakespeare and other works.
We respectfully acknowledge the University of Arizona is on the land and territories of Indigenous peoples. Today, Arizona is home to 22 federally recognized tribes, with Tucson being
home to the O’odham and the Yaqui. Committed to diversity and inclusion, the University strives to build sustainable relationships with sovereign Native Nations and Indigenous
communities through education offerings, partnerships, and community service
Unless an exception applies, certain textual content on this web page is subject to a Creative Commons Attribution 4.0 International License. To learn more, see the University of Arizona Libraries CC BY copyright policy. This license allows anyone to share and adapt that content as long as proper attribution is given and the license terms are followed.