Skip to Main Content

Data Sources

Copyright, fair use, and data

While not all texts are in the public domain, it is still possible to use copyright protected texts for digital text analysis projects under the terms of fair use. The Association of Research Libraries has published an Issue Brief on Text and Data Mining and Fair Use in the United States that offers a helpful explanation of what you can legally do when working with a text or texts that are copyright protected. Also see this ARL blog post for a summary of the brief. Note if you are working with a copyright protected text or texts, it's ok to publish or share the results of your analysis, but not a full reproduction of the copyright protected text.

Facts and data can't be copyright protected unlike texts, artwork, photographs, music, and film. See this University of Michigan Library page on Exceptions to Copyright: Facts and Data for a longer explanation.  Briefly, provided the data has been selected and arranged in a sufficiently transformative/creative way, an entire database can be copyright protected. However, the data contained within the database is still considered public domain. "Anyone can take those facts and reuse or republish them, as long as that person arranges them in a new way."

Where you still need to be careful - especially with the library's subscription resources - is in those cases where the licensing agreement for a database contains additional legal restrictions on how the data can be used.  Look for the terms and conditions, license agreement,  or access and use policies for each database to check for any restrictions.

Whatever kind of data you're working with, always cite your source!

Citing Data

How to Cite Data from Michigan State University Libraries

Cite Data and Statistics from UC Santa Cruz Library

MHC Accessibility Barriers Form