SSHRC’s Big Data panel called “Effective Digital Infrastructure for Research and Training Excellence” at the Canada 3.0 conference took place in Toronto on Tuesday, May 14, 2013 and included:
Dr. Wendy Cukier
Vice-President of Research and Innovation
Social Sciences and Humanities Research Council
Vice-President, Manufacturing, Development and Operations
Executive Chairman and Chief Strategy Officer
Open Text Corporation; SSHRC Council Member
Assistant professor of Information Science
Université de Montréal
Tom Jenkins: The interesting thing is that almost all of this is about the social sciences. As I joke with many, “it’s the revenge of librarians,” because it is. This is the revenge of the librarians. We do not have the algorithmic capability to be able to handle some of the complexity—this is very much a human intervening thing.
Pat Horgan: We try to talk about it in terms of Vs, and so let’s break down these Vs: volume, variety and velocity. Volume—if you can see this graph you may see in the middle it’s from 2010 to 2015 (on the right side) and “we are here” is in the middle. If we thought that there was a lot of data here today, we are just at the tip of where this is going to go, even in the next three years. This is not about a 10-year picture.
Eighty per cent of it now is unstructured data. It would be nice if it was all in a relational database that we could just all manage, but it doesn’t come that way any longer. Remember that. Eighty per cent is unstructured. Velocity—it’s just in time. There’s so much data that you couldn’t possibly store it, to put it in a contained spot, so you can understand it and do something with it.
This is kind of that “moment in time” kind of time that you have to actually see it as it’s going by, because if you don’t do it, if you don’t make a decision on something that’s going by, you’ll miss that point.
Wendy Cukier: Actually, some really fundamental methodological challenges that are emerging when we start looking for patterns in data—without necessarily knowing what we’re looking for—are where some of the most interesting things emerge. The other issue, of course, is that correlation is not causation. The fact that we have all this data—the fact that we can analyze it—doesn’t necessarily mean we know anything, without doing other kinds of research to look at the explanatory framework.
While historically in engineering and computer science and sciences, researchers have had labs and budgets have been designed to allow for lab technicians and support and so on. You don’t have anything comparable in terms of a lot of the big data analysis. There are still challenges in getting multidisciplinary research funded. Some of the most powerful opportunities are where you take someone from humanities, a computer scientist, a social scientist, someone from the business school, [and] throw them in a group to deal with some problems. It’s not easy to get research of that sort supported.
Vincent Larivière: Two aspects here, I think, are very important for the social sciences. First the digital era changed the way we actually do research. On the one hand we’ve got new phenomena like social media—Twitter and Facebook. We’ve got new data sources and we can study with these new data sources, these new phenomena. But it also increases the way we do research—the research collaboration.
The second point, which I think is very important, is training. We need to remember in social sciences that … our students come to study social sciences because they’re afraid of data, because they don’t want to deal with mathematics, they don’t want to deal with numbers. So I am teaching a class to information scientists. I’ve got 120 of them, [I] only show them one simple statistic, and they run away. It really is a big challenge for us in the social sciences to train students who will be good at analyzing this data.
Tom Jenkins: Did you ever wonder why Google Maps is as effective as it is? Or Flight Aware? Where does all that come from? That comes from federal agency data. You can think of, in Canada, NRCan. If Natural Resources Canada starts publishing—and they probably make up maybe three quarters of all the data within the Canadian government—if that starts being published then you can imagine what resource companies, oil and gas, mining, etc. can do on a crowd-sourcing basis. So there are some tremendous opportunities for everyone.
Ted Hewitt: In terms of what we can do as granting councils, I think there is huge potential with respect to tying some elements of behaviour and research behaviour to the granting process—and in ways most people expect and most people will understand—in order to bring the system together.
We will develop this over the next few weeks. We will begin consulting with stakeholders, I would say probably within a month, and then with the broader community towards the summer in order to develop the kind of policy and regulatory framework that will help us as a country build the network and build the infrastructure that we need to manage the data flows, and the research that results from that, over the next decades.