pass on the following information to students, postdocs, and
interested faculty members.
of the UBC Big Data and Computational Social Science Research
Cluster initiative, John McLevey and Evan Zhuan will be
teaching a FREE 2 day workshop on "Introduction to Big Data
and Automated Text Analysis for Social Scientists" on June 7th
and 8th, 2019.
details are also given at the bottom of this e-mail.
free to pass this along to others who may be interested.
workshop size is limited, and registration will be on a first
come first served basis.
Introduction to Big Data
and Automated Text Analysis for Social Scientists
workshop offers a practical introduction to fundamentals and
recent developments in the collection and analysis of big data
with an emphasis on automated text analysis. The workshop is
designed with social scientists in mind, but participants from
other fields are also welcome. I assume that participants have
little to no prior experience with methods of automated text
makes extensive use of the programming language python.
Although having some knowledge of Python is an asset, it is
not necessary. I will provide all participants with fully
executable code for all topics covered in the workshop.
Participants will be encouraged to modify the code to suit
their specific interests, but this requires minimal
programming knowledge and is not required.
Day 1: An
Introduction to Automated Text Analysis – June 7
The first day
will begin with a general introduction to the promises and
pitfalls of big data and automated text analysis in the social
sciences, followed by an overview of the applications of
supervised and unsupervised machine learning. It will conclude
with a comparison of two approaches to collecting text data
from the web: Application Programming Interfaces (APIs) and
The second part
of the first day will focus on the essential first steps of an
automated text analysis. Topics covered will include (1)
natural language processing tasks such as tokenizing text,
normalizing text, part-of-speech tagging, and named entity
recognition; and (2) methods for constructing document-term
matrices, which are required for the use of machine learning
Analyzing Unstructured Text Daa – June 8
The second day
picks up where the first day left off. We will begin with
applications of unsupervised learning to discover latent
themes and topics in text. We will focus on three different
approaches: (1) the vector space model, text similarity, and
cluster analysis; (2) topics modelling; and (3) semantic
network analysis. In the afternoon, we will focus on the use
of supervised learning to scale up traditional content
we may also cover (1) methods for sentiment analysis and
classifying text by political ideology, and (2) approaches to
integrating unsupervised and supervised machine learning.
John McLevey is an Associate
Professor in the Department of Knowledge Integration and the
Department of Sociology & Legal Studies at the University
of Waterloo. He primarily works in the areas of computational
social science and social network analysis, with substantive
interests in science and evidence-based policymaking,
environmental politics and governance, social movements, and
cognitive social science.
computational social scientist, Dr. McLevey’s most general
research goal is to advance our knowledge of how social
networks and institutions affect cognition and behaviour —
including the formation and diffusion of knowledge, beliefs,
biases, and behaviours — and the social and political
consequences of those complex transmission processes. His work
is funded by research grants from SSHRC and an Early
Researcher Award from the Ontario Ministry of Research and
Innovation. Among other things, Dr. McLevey is currently
writing a methods book on computational social science for
Sage. You can learn more about his work at johnmclevey.com and networkslab.org.
Lab Assistant: Yufan Zhuang
is a Graduate Research
Assistant in the Department of Computer Science, a master of
science candidate in the Data Science Institute of Columbia
University, also a graduate research intern at IBM Research in
the summer of 2019. He primarily works in natural language
processing and probabilistic programming, sometimes wanders
into other areas include computational sociology, psychology
and computer security.
Yufan’s current primary research
goal is to develop robust, generalizable deep learning
framework for sequential classification/generation. He also
has done work in probablistic topic modelling with
applications in sociology and exploring bias in machine