Post Conference Workshops

You are here: Home » Schedule » Post Conference Workshops

Saturday, 5/30, 9am-12pm

Workshop on Text Mining with the HathiTrust Research Center

Saturday, 5/30 - Sunday, 5/31

Software Carpentry and Data Carpentry Workshops


Workshop on Text Mining with the HathiTrust Research Center

An Introduction to Tools for Working with Digitized Text Corpora and Metadata

This workshop is free. Register in advance here:

Workshop to be conducted by:

Sayan Bhattacharyya

Post-doctoral Research Associate

Graduate School of Library and Information Science

University of Illinois at Urbana-Champaign


This workshop is intended for a broad audience ranging from curious graduate students exploring digital humanities to the experienced text mining researcher. The availability of large corpora of digitized text from the world’s research libraries has the potential to transform research in the humanities in novel ways. Not only scholars who specialize in digital humanities but also all scholars including those specializing in traditional areas can potentially benefit from using the resources and tools that are now becoming available in this field. The HathiTrust Digital Library (HTDL) is one of the premier resources for textual corpora and has a growing collection drawn from some of the world’s foremost research libraries, which currently consists of over thirteen million volumes of digitized text and the bibliographic metadata associated with them. Such an extensive corpus affords the ability to scale up inquiry and enables new kinds of research questions to be asked. The HathiTrust Research Center (HTRC), which is the HathiTrust’s research-oriented affiliate, has been developing sophisticated computational tools, including ones that will allow support for textual analytics even when copyright restrictions preclude the availability of the full-text content to scholars.

The workshop will provide a hands-on introduction to the HTDL collection and its metadata, and to the tools and functionalities developed by the HTRC that leverage these resources. Through the concrete instances of the HTRC tools, the workshop will orient attendees about the new challenges and opportunities that the ability to carry out algorithmic text analysis at such a large scale presents to researchers. The workshop will cover the Secure Hathi Analytics Research Commons (SHARC), the HathiTrust+Bookworm (HT+BW) tool and the HTRC Extracted Features Dataset. Attendees will be shown how to build their own worksets (small, customized subcorpora from the HathiTrust Digital Library corpus) and how to conduct analyses on worksets. There will also be group discussion involving all attendees about the emerging questions that these novel developments are likely to inaugurate in their own fields and about how these developments can affirm or disrupt (or both affirm and disrupt simultaneously) established practices of inquiry.


Data Carpentry and Software Carpentry Workshops

Following the HASTAC 2015 conference, Michigan State University is pleased to announce that we will be hosting two separate workshops on Data Carpentry and on Software Carpentry. These will run 9 am to 5 pm on Saturday May 30 (9am-5pm) and Sunday May 31 (9am-3pm).

Software Carpentry and Data Carpentry’s missions are to teach fundamental computational skills to researchers. Software Carpentry focuses on programming best practices for people with some programming experience. Workshops teach good programming practices in Python, version control with Github and the command line. Data Carpentry teaches basic concepts, skills, and tools for working more effectively with data to those with little to no prior computational experience. Workshops teach best practices for data organization in spreadsheets, text mining and data analysis in R.

These workshops will be focused on data sets applicable to social scientists, humanists, librarians, and archivist relative to the HASTAC 2015 conference theme, the “Art and Science of Digital Humanities.”

For more information, please see and These workshops are open to the public. The registration form is here: There is a $20 fee to attend either of the workshops.

Both Software Carpentry and Data Carpentry workshops are now full. There is a wait list for each of the workshops. Email [email protected] to add your name to the wait list.

These workshops were made possible by generous support from MSU IT and the Institute for Cyber-Enabled Research.

Software Carpentry Logo

Data Carpentry Logo

HASTAC2015 on Twitter

No tweets found.