I thought many of you on this list would find of great interest our latest paper out today, which represents one of the first pilot large-scale content analyses of JSTOR, DTIC, and the Internet Archive.  The hope is that this paper will serve as a blueprint and template for others and inspire, seed, and enable a new wave of large-scale internet and literature content analysis research and to open the door to new disciplinary applications like socio-cultural and area studies work.

For those interested in working with academic literature collections like JSTOR, government document repositories like DTIC, or the open web via the Internet Archive, this paper provides a blueprint for how to work with the collections, their nuances, artifacts, and strengths, lessons learned (for example how to work with the Internet Archive's 1.6-billion PDF archive in the absence of fulltext search or metadata), and general workflows.

The vast array of academic literature published by the humanities and social sciences disciplines codifies our collective scholarly understanding of how societies function and the beliefs, ideals, and ethnic, religious, and tribal contexts that undergird global societal behavior, yet this material has been largely absent from the recent computational revolution in the study of culture. Applying temporal, geographic, thematic, and citation algorithms to an archive of more than 21 billion words spanning 1.5 million publications from 7 collections, including the entire contents of JSTOR, DTIC, CORE, CiteSeerX, and the Internet Archive's 1.6 billion PDFs, academic literature is seen to offer a powerful new lens onto global culture. Four case studies demonstrate using this archive to map the Nuer ethnic group and identify its top experts, map the literature on food and water security, explore the thematic underpinnings of the Rwandan genocide, and construct a network over the ethnic groups of the world as seen through the combined academic literature of the past half century.

Kalev Leetaru
2013-2014 Yahoo! Fellow, Georgetown University

