Skip to Main Content

HathiTrust Digital Library at Queen's University

About Text Mining with HathiTrust

The HathiTrust Research Center (HRTC) facilitates large-scale textual analysis of HathiTrust content. Physically located at the University of Illinois and Indiana University, the HTRC provides access for researchers to a number of research and development tools and services.

What is 'text analysis' or 'text mining?'

Text mining involves using computers to reveal information in or about text. According to HathiTrust, computers are used to identify patterns in word use, structure, and composition that can shift a researcher perspectives about a text.

How does text analysis work?

Let's say there is a particular text (or texts) that has been digitized as part of the HathiTrust corpus. You would need to decide what algorithm to run against the text, set it up with the help of HRTC staff, and then the research will analyze the results. The support of the HRTC is instrumental throughout this process; from providing text, to providing the tools and services needed to run the analysis e.g. the "techie" stuff. For more information, please take a few minutes to view this Introduction to the HathiTrust Research Center (HRTC) video.

 

Additional Resources

HRTC Training Materials

HRTC's current training materials are accessible via their Google drive folder.

Additional information

If you are interested in learning more or you could use help getting started, please contact the HRTC (HathiTrust Research Training Center) directly:

Support is available for researchers and/or librarians as part of the initiative, and can include support from HTRC's developers as well as specialized expertise.