Skip to main content

Text Mining

What is text mining?

Text mining is a method of turning text into data for computational analysis. It can uncover patterns in large bodies of text (called corpora) that might otherwise be hidden. Source: Underwood, T. (2015). Seven Ways Humanists are Using Computers to Understand Text. The Stone and the Shell.

How do I get started?

In starting a text mining project, think about the following questions:

  • What is your research question?
  • What text(s) do you want to use?
  • Are the texts available in machine-readable form?
  • What is the quality of the texts? Do they need to be corrected/cleaned up?

Find content to mine: library resources

Text and data mining, and systematic downloading, is usually not permitted under most of the Library's license agreements. These are some resources that allow for text mining. For questions about text mining access to other library resources, please contact us.

Find content to mine: other resources

Use text analysis tools

Use text mining tools with corpora included