Google dataset search

This may be of interest to the text-as-data community:

In today’s world, scientists in many disciplines and a growing number of journalists live and breathe data. There are many thousands of data repositories on the web, providing access to millions of datasets; and local and national governments around the world publish their data as well. To enable easy access to this data, we launched Dataset Search, so that scientists, data journalists, data geeks, or anyone else can find the data required for their work and their stories, or simply to satisfy their intellectual curiosity.

IC2S2 2018 conference keynote

Due to advances in machine learning and computational techniques, and the proliferation of digital footprints, human and societal behavior that was previously unquantifiable and unobservable now generates data that can be collected and analyzed to make insights and predictions.

Find the keynote address video here.

New data set: Terrier

From the Terrier website:

TERRIER (Temporally Extended, Regular, Reproducible International Event Records) BETA is a new machine coded event dataset produced from a historical corpus ranging from 1979 to 2016, available for download at OSF. Event data generates structured records of political events described in text in the form of (1) a source actor (2) committing an action (3) against a target. The political events recorded in the dataset include a wide range of political behaviors: meetings, statements, provision of aid, protests, attacks, and violence. This dataset is an initial beta release of the data, lacking event geolocation. We encourage researchers to carefully check the data they use and to contact our team with any issues they uncover regarding the data by opening a thread on our discussion forum.

The dataset was produced by a team at the University of Oklahoma as part of the NSF RIDIR grant “Modernizing Political Event Data” SBE-SMA-1539302. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF or the U.S. government.

What is QuantText?

Welcome! QuantText is a repository for text-as-data research and pedagogy in Political Science and related fields (Cognitive Science, Psychology, Linguistics, Computer Science, etc.). It contains a collection of text-as-data syllabi, information about previous text-as-data conference programs, links to future conferences, a list of computational tools for quantitative text analysis, political science articles using text-as-data methodology, and interactive user functionality so you can upload links to your text-as-data sets (i.e., from Harvard Dataverse or your own webpage) as well as articles – visit the Register page to log in.

Please email me with any questions: leah [dot] windsor [at] memphis [dot] edu