Skip to main content

Researcher Resources for Publishing & Learning

Mining Library Database Content

Adam Matthew: "Provides documented API access to the full data set (full text if available but not images) locked down by IP address and API key for security." More information:

Gale Primary Resources: "Gale delivers this content upon customer request and in a cost-effective manner for the use of text and data mining." 

IEEE Xplore: Provides a metadata API for which UofM students and faculty can register to get a key: 

JAMA -  Subject to certain terms and conditions, Site Licensees and their Authorized Users to conduct text and data mining of metadata of the Licensed Material. Each Authorized User must agree to these text and data mining terms and they must recommit to them each time they begin a new search. Access, read, and agree to JAMA text and data mining Policy and License Agreement:

JSTOR: Define and request datasets of JSTOR content, or download a sample dataset for teaching text mining techniques:

SAGE Stats: "SAGE Stats makes research easy by providing, in one place, annual measures dating back more than two decades."

ScienceDirect Elsevier:  "We have adopted a license–based approach which automatically enables researchers at subscribing institutions to text mine for non-commercial research purposes and to gain access to full text content in XML for this purpose."

Wiley Online Library: Wiley grants subscribers and other lawful users the right to text and data mine online content for non-commercial purposes.

Mining Open Access Content

This list is not meant to be comprehensive (!), but rather a selected list of open access text corpus:

Biomed Central:

Chronicling America:

Hathi Trust:

Internet Archive:

Project Gutenberg: 

Public Library of Science (PLOS):