In 1998, a research paper was published which claimed that autism disorders were linked to MMR (measles, mumps and rubella) vaccines. The results of this research were widely published in Britain, leading to a sharp decline in uptake of the vaccine in that country (Elliman & Bedford, 2007). Though data journalism was not as prevalent in 1998 as it is now, a simple look at the methodology should have prevented at least some of the British media outlets from reporting this. The supposed relation between autism and vaccines was based on only 12 patients, too few to be representable. Next to that, almost all of the other research on the link between autism and vaccines found no link at all between the two.
Confirmed cases of measles rose in England and Wales as uptake of the vaccine decreased, even leading to the first measles-related death since 1992 (2006). It’s both an example of sources not being verified properly (as discussed in my previous blog post) and as an (early) example of data not being properly validated. It was not representative and other data denied the link between autism and vaccination. Later it turned out that the researcher who found the supposed link between autism and vaccination, had committed fraud by manipulating the numbers of his research.
In this blog post we will take a closer look at the validation of data sets. This will be illustrated by showing the proces of turning data into a story, and by illustrating the potential pitfalls of this proces. By illuminating this, I sincerely hope cases like the MMR vaccine controversy can be prevented.
Turning data into stories
According to Simon Rogers (via http://datajournalismcourse.net), journalism means reporting facts in such a way that people can better understand the issues that matter. Related to that, as a data journalist it is your role to bring data to life. Using numbers, you have to find the best possible way to tell the story of the data. In the case of requesting data, it would be helpful to have a list ready of questions you want to answer. You need to know what you want to get out of the data set, as it can only help you with the variables it’s made out of. Of course, a data set can inspire questions on its own as well. The fewer numbers you can use to tell the story, the better.
There are four key-roles involved (for either a team, duo or a lone wolf) when it comes to turning data into stories:
- Development and coding
- Designing and visualising
The essential information you can get out of the data comes from asking the five W’s of journalism (Scanlan, 2003):
- Who? – Finding the source of the data, and verifying how reliable it is. Your piece is also considered to be more reliable when you can be transparent about your source.
- What? – The point you’re trying to get across, what you’re saying. Tell the story in a clear way that bridges the gap between the data and the reader.
- When? – The date from which your data stems
- Where? – Geolocation.
- Why? – There is correlation, but this does not mean that there is causation.
According to Paul Bradshaw (in the third module of the Online Data Journalism Course), the usual starting point is either that you have a question that needs data, or a dataset that need questioning. He sees the compilation of data as that which defines either of them as data journalism. It’s no surprise then, that compiling is at the start of his inverted pyramid of data journalism (Bradshaw, 2011).
Compiling your data is the fundament, as everything is build upon it and you will return to your data at every other stage. Cleaning means removing any errors. Next is the context, which can be found by using the five W’s. Find a story that’s both newsworthy and easy to explain to someone who has never heard about it before. Then, with combining you can combine two or more data sets so you can have multiple sources for the same story. Finally, you can communicate your story – visualize the results, create a narrative, etc.
What can go wrong?
Based on these first two blog posts made so far, I think it’s fair to conclude that the fundament of all data journalism is the need to be accurate. Hermida (2012) stated something similar, as he named truth, facts and reality as the three values a (data) journalist must adhere to. However, having the intention and actually being accurate are two different things. Various elements of the proces of turning data into stories, as discussed in the previous paragraph, can lead to errors.
Knowing what you’re dealing with saves you a lot of time, and preparing well by going through your data set thoroughly will only help you. Might seem obvious, but it is always good to remind procrastinators (such as myself) that putting time into the preparation is essential. Especially with deadlines nearing, which can lead to lazy journalism and sloppy verification.
A complicated story in itself is not an error per se, but it can turn into one if you can’t translate the data into understandable language for the layman unfamiliar with the subject.
Errors in the data set
No data is infallible. Nils Mulvad, for instance, discovered while approaching school leaders that the grades in the data released by the Danish ministry of education were miscalculated (Bradshaw, 2013). If a journalist would not check this properly, it could lead to published errors. Especially check your data again if it all seems too good to be true.
Errors in interpreting the data set
Especially in big data, it is easy to find correlations between several insanely different variables. An example of this is the positive correlation between ice cream sales and violent crimes and murder (Peters, 2013). Correlation does not necessarily equate causation. For lots of funny examples of this, check this site: http://www.tylervigen.com/
Unable to find structure in data
This is not necessarily the end of the world, but as Paul Bradshaw illustrated in the third module of the Online Data Journalism Course, it makes scraping a lot easier. The more structure, the more repetition, the easier it is to set up a scraper to do repetitive tasks you would have to do otherwise.
Unfamiliar with tools
Of course, in a team one can spread the work and thereby avoiding working with something they’re unfamiliar with. Nevertheless, it might be a good idea to become familiar with tools data journalists often use (for instance, Google Drive spreadsheet for scraping). Just in case.
Confirmation bias is the tendency to seek out or interpret information in a way that confirms someone’s beliefs or hypotheses (Miller, et al. 2009). An example of this is the Daily Mail still reporting on links between autism and MMR vaccines, even after the researcher of the original 1998 research paper retracted the paper and admitted it was false (Bloodworth, 2013). Though, instead of confirmation bias, one could speculate about an hidden agenda. There’s fear-mongering, but in the early 2000’s many British news outlets were also using the MMR vaccine controversy as a chance to attack the government (Goldacre, 2008). This bias can really creep up on you, so be mindful of it. At times, it’s good to question everything; even yourself. That’s a good note to end on.
Bloodworth, J. (2013). Is the Daily Mail killing children?. Retrieved from: http://leftfootforward.org/2013/04/is-the-daily-mail-killing-children/
Bradshaw, P. (2011). The inverted pyramid of data journalism. Retrieved from: http://onlinejournalismblog.com/2011/07/07/the-inverted-pyramid-of-data-journalism/
Bradshaw, P. (2013). Ethics in data journalism: accuracy. Retrieved from: http://onlinejournalismblog.com/2013/09/13/ethics-in-data-journalism-accuracy/
Elliman, D., & Bedford, H. (2007). MMR: where are we now?. Retrieved from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2066086/
Goldacre, B. (2008). The media’s MMR hoax. Retrieved from: http://www.badscience.net/2008/08/the-medias-mmr-hoax/
Health Protection Agency Increase in measles cases in 2006, in England and Wales. CDR Wkly(Online), 2006; 16(12). Retrieved from: http://www.hpa.org.uk/cdr/archives/2006/cdr1206.pdf
Hermida, A. (2012). Tweets and truth: Journalism as a discipline of collaborative verification. Journalism Practice, 6(5-6), pp. 659-668.
Miller, F. P., Vandome, A., & McBrewster, J. (2009). Confirmation Bias. VDM Publishing.
Peters, J. (2013). When Ice Cream Sales Rise, So Do Homicides. Coincidence, or Will Your Next Cone Murder You. Retrieved from: http://www.slate.com/blogs/crime/2013/07/09/warm_weather_homicide_rates_when_ice_cream_sales_rise_homicides_rise_coincidence.html
Scanlan, C. (2003). Writing from the Top Down: Pros and Cons of the Inverted Pyramid. Retrieved from: http://www.poynter.org/how-tos/newsgathering-storytelling/chip-on-your-shoulder/12754/writing-from-the-top-down-pros-and-cons-of-the-inverted-pyramid/