NYU Text-as-Data Fall Speakers Series

The NYU NLP and Text-as-Data Speaker Series takes place on Thursdays from 4 – 5:30 pm at the Center for Data Science, 60 Fifth Avenue (7th floor common area).  Expanding from its original emphasis on text data applied to social science applications, the Series incorporates the growing interest in Natural Language Processing from a variety of disciplines, especially Computer Science and Linguistics. The series provides an opportunity for attendees to see cutting-edge NLP and other text-as-data work from the fields of social science, computer science and other related disciplines.  The seminar is organized by Professors Arthur Spirling and Sam Bowman, and is open to anyone who is affiliated with NYU and wishes to attend.

PaCSS keynote speech (Gary King)

The keynote speech for the Preconference on Politics and Computational Social Science is available here. A roundup of the best PaCSS tweets here.

Ahead of this year’s APSA general meeting, we attended the Politics and Computational Social Science (PaCSS) pre-conference, hosted at Northeastern University. The event brought together  political scientists working with large-scale data sets and emerging computational methods.

Google dataset search

This may be of interest to the text-as-data community:

In today’s world, scientists in many disciplines and a growing number of journalists live and breathe data. There are many thousands of data repositories on the web, providing access to millions of datasets; and local and national governments around the world publish their data as well. To enable easy access to this data, we launched Dataset Search, so that scientists, data journalists, data geeks, or anyone else can find the data required for their work and their stories, or simply to satisfy their intellectual curiosity.

IC2S2 2018 conference keynote

Due to advances in machine learning and computational techniques, and the proliferation of digital footprints, human and societal behavior that was previously unquantifiable and unobservable now generates data that can be collected and analyzed to make insights and predictions.

Find the keynote address video here.

New data set: Terrier

From the Terrier website:

TERRIER (Temporally Extended, Regular, Reproducible International Event Records) BETA is a new machine coded event dataset produced from a historical corpus ranging from 1979 to 2016, available for download at OSF. Event data generates structured records of political events described in text in the form of (1) a source actor (2) committing an action (3) against a target. The political events recorded in the dataset include a wide range of political behaviors: meetings, statements, provision of aid, protests, attacks, and violence. This dataset is an initial beta release of the data, lacking event geolocation. We encourage researchers to carefully check the data they use and to contact our team with any issues they uncover regarding the data by opening a thread on our discussion forum.

The dataset was produced by a team at the University of Oklahoma as part of the NSF RIDIR grant “Modernizing Political Event Data” SBE-SMA-1539302. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF or the U.S. government.