“Six Provocations For Big Data” – A Summary
“Six Provocations for Big Data” is a new paper written by Danah Boyd and Kate Crawford that caught my attention last week. I read it over the weekend and decided to write a post about it, simply because the topic is so fascinating and also highly relevant to what I think about on a daily basis. I got my hands on an advance digital copy; the authors also presented at Oxford Internet Institute’s “A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society” on September 21, 2011.
It’s a relatively quick and very fascinating read. Boyd and Crawford present a set of thought-provocing statements about the discourse and challenges of “Big Data” – the vast amounts of information that is being produced, collected and interpreted about what we do, how we do it, where we do it and who or what we’re doing it with.
I’ve provided my summations of the six key points below (in bold) – I hope to follow up shortly with some reflections on the implications of these provocations.
[A quick note on terminology: The authors speak of Big Data as “the massive quantities of information produced by and about people, things and their interactions”. Big Data is also commonly attributed to data sets that are so huge that they can't be processed with commonly available tools within reasonable time (and this is obviously a moving definition; as technology makes advances, the size of the data sets that we are able to process grows).]
1. Automating Research Changes The Definition Of Knowledge.
Similarly to how Fordism changed our understanding of labor, the human relationship to work, and society at large, Big Data ushers in a new paradigm of understanding human networks and community. This radical shift brings with it proponents of a philosophy where the numbers are left to speak for themselves, and quick analysis is favored over deep reflection over time. Because the tools currently available to us come with inherent flaws and restrictions, we must critically examine Big Data’s models of intelligibility before they become accepted beliefs.
2. Claims To Objectivity And Accuracy Are Misleading.
With the increasing number of social spaces that become quantifiable through Big Data, there’s a growing overlap between the social and computational sciences. As computational scientists begin to study society to a larger extent, there’s a danger, stemming from the quantitative nature of this research, that the results are accepted as fact rather than interpretation. The complex methodological processes that underlie analysis of Big Data must be outlined and accounted for.
3. Bigger Data Is Not Always Better Data.
Some of those who are embracing Big Data dismiss traditional methods for assessing the validity of research as irrelevant, presuming that quantity implies quality. The unknowns that researchers face when working with Big Data are many, yet these limitations are rarely acknowledged. In order to minimize misinterpretation of Big Data, we should strive to maximize communication about and transparency of the underlying research methodologies, the limits of the questions we can ask of a dataset and which interpretations are appropriate.
4. Not All Data Are Equivalent.
The assumption made by some that analyses done with small data can be done better with Big Data holds false, because it presumes that data is interchangeable and that context doesn’t matter. The authors point to social network analysis as an example. Researchers of this particular discipline study networks produced through data traces such as mediated communication and geographical movement – the networks we maintain by creating digital contact lists (‘articulated networks’), and the networks we maintain through communication patterns, cell coordinates or social media interactions (‘behavioral networks’). The trouble with this data is that, while it is valuable to research, it is not representative of the nature and complexity of our social behaviors.
5. Just Because It’s Accessible Doesn’t Make It Ethical.
Big Data rarely acknowledges the difference between being in public and being public; just because information is accessible doesn’t mean that it is ethical for researchers to use it. A good deal of Big Data is created by people who don’t understand that their data will be publicly or semi-publicly available and that it might get collected for various uses. The authors list de-anonymization of data, a lack of established ethic guidelines and the difficulties of understanding the future consequences as reasons to why researchers have a higher responsibility to uphold accountability and professional standards.
6. Limited Access To Big Data Creates New Digital Divides.
The authors argue that the ecosystem surrounding Big Data is creating a digital divide of the Big Data rich and the Big Data poor. Much of the enthusiasm surrounding Big Data comes from the belief that Big Data is easily accessible. In actuality, only social media companies have access to really large sets of data. They have no responsibility to make it available and have full control over who gets to use it. Access to it is also dependent on having the right skills to collect and to it, which generally favors computational scientists and puts the social sciences to a disadvantage. This has serious implications on the types of research questions that are asked of Big Data.
If you’ve come this far: Have you read this paper yourself? What are your thoughts?
