Congratulations to the SSHRC‑funded winners of the 2013 Digging into Data Challenge

Digging into Linked Parliamentary Data

(Principal Investigators: Maarten Marx, University of Amsterdam, The Netherlands; Jane Winters, University of London, United Kingdom; Christopher Cochrane, University of Toronto, Canada)

This project brings together political scientists, historians and computational linguists, from Canada, The Netherlands and the United Kingdom, to enable large‑scale analysis of the proceedings of three parliaments from circa 1800 to the present day. This data reflects any event of significance over the past 200 years, and will be enhanced during the course of the project to shed light on developments across different nations, cultures and systems of political representation. The project will deliver a common, and extensible, format for encoding parliamentary proceedings; a joint, linked dataset covering all three jurisdictions; a range of tools to facilitate the longitudinal study of parliamentary data; and a series of case studies to test and inform the chosen methodology.

SSHRC contribution: $98,750

Global Currents: Cultures of Literary Networks, 1050-1900

(Principal Investigators: Elaine Treharne, Stanford University, United States; Lambert Schomaker, University of Groningen, The Netherlands; Andrew Piper, McGill University, Canada)

This project undertakes the cross‑cultural study of literary networks in a global context, ranging from post‑classical Islamic philosophy to the European Enlightenment. Integrating new image‑processing techniques with social network analysis, we examine how different cultural epochs are characterized by unique networks of intellectual exchange. Research on "world literature" has become a central area of inquiry today within the humanities, and yet, so far, data‑driven approaches have largely been absent from the field. The combined approach of visual language processing and network modeling will allow the researchers to study the non‑western and pre‑print textual heritages so far resistant to large‑scale data analysis, as well as to develop a new model of global comparative literature that preserves a sense of the world’s cultural differences.

SSHRC contribution: $124,942

Project Arclight: Analytics for the Study of 20th Century Media

(Principal Investigators: Eric Hoyt, University of Wisconsin‑Madison, United States; Charles Acland, Concordia University, Canada)

Commercial media companies have embraced computational analytics to study discussions of media content across social media data streams. Data mining companies identify actors and TV shows that are “trending” in global popularity, along with more granular analyses of regional tastes, social networks, and discourse. We propose to apply a similar methodology toward the study of film and media history. Project Arclight will create a web‑based tool that enables the study of 20th century American media through comparisons across time and space. The Arclight tool will be built using several popular open source technologies, including Ruby on Rails, Javascript, and Solr. The tool will analyze roughly two million pages of public domain publications derived from two repositories: the Media History Digital Library (which uses the Internet Archive’s scanning, hosting, and preservation services) and the Library of Congress Chronicling America collection.

SSHRC contribution: $82,760

MIning Relationships Among variables in large datasets from CompLEx systems (MIRACLE)

(Principal Investigators: C. Michael Barton, Arizona State University, United States; Tatiana Filatova, University of Twente, The Netherlands; Terence P. Dawson, University of Dundee, United Kingdom; Dawn Cassandra Parker, University of Waterloo, Canada)

Social scientists have used agent-based models (ABMs) to explore the interaction and feedbacks among social agents and their environments. The bottom‑up structure of ABMs enables simulation and investigation of complex systems and their emergent behavior with a high level of detail; however the stochastic nature and potential combinations of parameters of such models create large non‑linear multidimensional “big data,” which are difficult to analyze using traditional statistical methods. This project seeks to address this challenge by developing algorithms and web‑based analysis and visualization tools that provide automated means of discovering complex relationships among variables. The tools will enable modelers to easily manage, analyze, visualize, and compare their output data, and will provide stakeholders, policy makers and the general public with intuitive web interfaces to explore, interact with and provide feedback on otherwise difficult‑to‑understand models.

SSHRC contribution: $125,000

Cleaning, Organizing, and Uniting Linguistic Databases (the COULD project)

(Principal Investigators: Maria Polinsky, Harvard University, United States; Alan Bale, Concordia University, Canada)

The COULD project has 5 goals. (1) It seeks to transfer existing linguistic data from a variety of different formats into a universal format that will allow linguists to combine and share information, not only with other linguists but also with the public at large. (2) The project will build applications that automatically correct errors, draw attention to inconsistencies, and fill gaps in the data. (3) These automated mechanisms will provide new tools to detect patterns that are not obvious when looking at smaller databases. (4) The project seeks to make the vast amounts of linguistic data, currently only being used by researchers, available to second language learners by developing search algorithms that facilitate lesson creation. (5) The project will make data collection easier and thus make language preservation and documentation less dependent on experts. Communities trying to revive endangered languages will benefit directly from this project.

SSHRC contribution: $125,000

Mining Biodiversity

(Principal Investigators: William Ulate Rodriguez, Missouri Botanical Garden, United States; Sophia Ananiadou, University of Manchester, United Kingdom; Anatoliy Gruzd, Dalhousie University, Canada)

The Mining Biodiversity project aims to transform the Biodiversity Heritage Library (BHL) into a next‑generation social digital library resource to facilitate the study and discussion (via social media integration) of legacy science documents on biodiversity by a worldwide community and to raise awareness of the changes in biodiversity over time in the general public. The project will integrate novel text mining methods, visualisation, crowdsourcing and social media into the BHL to provide a semantic search system.

SSHRC contribution: $125,000

Field Mapping: An Archival Protocol for Social Science Research Findings

(Principal Investigators: Frank Bosco, Virginia Commonwealth University, United States; Piers Steel, University of Calgary, Canada)

In this project, psychology and management scholars from the United States and Canada will collaborate with an expert in online research and classification methods to devise a web application that will (1) enable the encoding of millions of individual findings in a multidisciplinary social science research domain, (2) facilitate complex analyses, and (3) provide open access to members of the scholar community and the general public. This project provides protocols for the extraction and classification of research findings into a semantic taxonomy. The foundation of this taxonomy will change how researchers search for and analyze findings from big data. The project researchers will develop efficient algorithms to access and analyze research findings. This will lead to the eventual goal—a comprehensive repository of findings from social science research that is updated continuously and responds to dynamic queries.

SSHRC contribution: $124,788

Digging Archaeology Data: Image Search and Markup (DADAISM)

(Principal Investigators: Maarten de Rijke, University of Amsterdam, The Netherlands; Helen Petrie, University of York, United Kingdom; Mark Eramian, University of Saskatchewan, Canada)

Teams from the United Kingdom, Canada and the Netherlands will investigate how we can use interactive systems design in conjunction with image processing and text mining techniques to help archaeologists find, organize and analyze the thousands of image and document resources available to them for answering archaeology research questions.

SSHRC contribution: $124,965

Total SSHRC contribution: $931,205