Finding Stories in Data

In 1998, a research paper was published which claimed that autism disorders were linked to MMR (measles, mumps and rubella) vaccines. The results of this research were widely published in Britain, leading to a sharp decline in uptake of the vaccine in that country (Elliman & Bedford, 2007). Though data journalism was not as prevalent in 1998 as it is now, a simple look at the methodology should have prevented at least some of the British media outlets from reporting this. The supposed relation between autism and vaccines was based on only 12 patients, too few to be representable. Next to that, almost all of the other research on the link between autism and vaccines found no link at all between the two.

Confirmed cases of measles rose in England and Wales as uptake of the vaccine decreased, even leading to the first measles-related death since 1992 (2006). It’s both an example of sources not being verified properly (as discussed in my previous blog post) and as an (early) example of data not being properly validated. It was not representative and other data denied the link between autism and vaccination. Later it turned out that the researcher who found the supposed link between autism and vaccination, had committed fraud by manipulating the numbers of his research.

In this blog post we will take a closer look at the validation of data sets. This will be illustrated by showing the proces of turning data into a story, and by illustrating the potential pitfalls of this proces. By illuminating this, I sincerely hope cases like the MMR vaccine controversy can be prevented.

Turning data into stories

According to Simon Rogers (via http://datajournalismcourse.net), journalism means reporting facts in such a way that people can better understand the issues that matter. Related to that, as a data journalist it is your role to bring data to life. Using numbers, you have to find the best possible way to tell the story of the data. In the case of requesting data, it would be helpful to have a list ready of questions you want to answer. You need to know what you want to get out of the data set, as it can only help you with the variables it’s made out of. Of course, a data set can inspire questions on its own as well. The fewer numbers you can use to tell the story, the better.

There are four key-roles involved (for either a team, duo or a lone wolf) when it comes to turning data into stories:

  1. Research
  2. Writing
  3. Development and coding
  4. Designing and visualising

The essential information you can get out of the data comes from asking the five W’s of journalism (Scanlan, 2003):

  • Who? – Finding the source of the data, and verifying how reliable it is. Your piece is also considered to be more reliable when you can be transparent about your source.
  • What? – The point you’re trying to get across, what you’re saying. Tell the story in a clear way that bridges the gap between the data and the reader.
  • When? – The date from which your data stems
  • Where? – Geolocation.
  • Why? – There is correlation, but this does not mean that there is causation.

According to Paul Bradshaw (in the third module of the Online Data Journalism Course), the usual starting point is either that you have a question that needs data, or a dataset that need questioning. He sees the compilation of data as that which defines either of them as data journalism. It’s no surprise then, that compiling is at the start of his inverted pyramid of data journalism (Bradshaw, 2011).

Inverted Pyramid Theory of Data Journalism

 

Compiling your data is the fundament, as everything is build upon it and you will return to your data at every other stage. Cleaning means removing any errors. Next is the context, which can be found by using the five W’s. Find a story that’s both newsworthy and easy to explain to someone who has never heard about it before. Then, with combining you can combine two or more data sets so you can have multiple sources for the same story. Finally, you can communicate your story – visualize the results, create a narrative, etc.

What can go wrong?

Based on these first two blog posts made so far, I think it’s fair to conclude that the fundament of all data journalism is the need to be accurate. Hermida (2012) stated something similar, as he named truth, facts and reality as the three values a (data) journalist must adhere to. However, having the intention and actually being accurate are two different things. Various elements of the proces of turning data into stories, as discussed in the previous paragraph, can lead to errors.

Being underprepared

Knowing what you’re dealing with saves you a lot of time, and preparing well by going through your data set thoroughly will only help you. Might seem obvious, but it is always good to remind procrastinators (such as myself) that putting time into the preparation is essential. Especially with deadlines nearing, which can lead to lazy journalism and sloppy verification.

Complicated story

A complicated story in itself is not an error per se, but it can turn into one if you can’t translate the data into understandable language for the layman unfamiliar with the subject.

Errors in the data set

No data is infallible. Nils Mulvad, for instance, discovered while approaching school leaders that the grades in the data released by the Danish ministry of education were miscalculated (Bradshaw, 2013). If a journalist would not check this properly, it could lead to published errors. Especially check your data again if it all seems too good to be true.

Errors in interpreting the data set

Especially in big data, it is easy to find correlations between several insanely different variables. An example of this is the positive correlation between ice cream sales and violent crimes and murder (Peters, 2013). Correlation does not necessarily equate causation. For lots of funny examples of this, check this site: http://www.tylervigen.com/

Unable to find structure in data

This is not necessarily the end of the world, but as Paul Bradshaw illustrated in the third module of the Online Data Journalism Course, it makes scraping a lot easier. The more structure, the more repetition, the easier it is to set up a scraper to do repetitive tasks you would have to do otherwise.

Unfamiliar with tools

Of course, in a team one can spread the work and thereby avoiding working with something they’re unfamiliar with. Nevertheless, it might be a good idea to become familiar with tools data journalists often use (for instance, Google Drive spreadsheet for scraping). Just in case.

Confirmation bias

Confirmation bias is the tendency to seek out or interpret information in a way that confirms someone’s beliefs or hypotheses (Miller, et al. 2009). An example of this is the Daily Mail still reporting on links between autism and MMR vaccines, even after the researcher of the original 1998 research paper retracted the paper and admitted it was false (Bloodworth, 2013). Though, instead of confirmation bias, one could speculate about an hidden agenda. There’s fear-mongering, but in the early 2000’s many British news outlets were also using the MMR vaccine controversy as a chance to attack the government (Goldacre, 2008). This bias can really creep up on you, so be mindful of it. At times, it’s good to question everything; even yourself. That’s a good note to end on.

References

Bloodworth, J. (2013). Is the Daily Mail killing children?. Retrieved from: http://leftfootforward.org/2013/04/is-the-daily-mail-killing-children/

Bradshaw, P. (2011). The inverted pyramid of data journalism. Retrieved from: http://onlinejournalismblog.com/2011/07/07/the-inverted-pyramid-of-data-journalism/

Bradshaw, P. (2013). Ethics in data journalism: accuracy. Retrieved from: http://onlinejournalismblog.com/2013/09/13/ethics-in-data-journalism-accuracy/

Elliman, D., & Bedford, H. (2007). MMR: where are we now?. Retrieved from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2066086/

Goldacre, B. (2008). The media’s MMR hoax. Retrieved from: http://www.badscience.net/2008/08/the-medias-mmr-hoax/

Health Protection Agency Increase in measles cases in 2006, in England and Wales. CDR Wkly(Online), 2006; 16(12). Retrieved from: http://www.hpa.org.uk/cdr/archives/2006/cdr1206.pdf

Hermida, A. (2012). Tweets and truth: Journalism as a discipline of collaborative verification. Journalism Practice, 6(5-6), pp. 659-668.

Miller, F. P., Vandome, A., & McBrewster, J. (2009). Confirmation Bias. VDM Publishing.

Peters, J. (2013). When Ice Cream Sales Rise, So Do Homicides. Coincidence, or Will Your Next Cone Murder You. Retrieved from: http://www.slate.com/blogs/crime/2013/07/09/warm_weather_homicide_rates_when_ice_cream_sales_rise_homicides_rise_coincidence.html

Scanlan, C. (2003). Writing from the Top Down: Pros and Cons of the Inverted Pyramid. Retrieved from: http://www.poynter.org/how-tos/newsgathering-storytelling/chip-on-your-shoulder/12754/writing-from-the-top-down-pros-and-cons-of-the-inverted-pyramid/

Advertenties

11 gedachtes over “Finding Stories in Data

  1. Again, just like your last post, awesome writing. Also a very detailed account on the data journalism course modules, which you described to prove your point. A big like about this post is that it relates to the other post as well, so there is a sense of continuity to it. I really don’t have anything else to comment about your post, but I do question your conclusion of accuracy as the fundament of all data journalism. I agree that data journalism should lead to accurate reporting, but is it really the fundament of data journalism? I thought the fundament was to tell stories in a way that people can understand. The question of whether or not the story is accurate is more related to journalism practices in general rather than solely data journalism. Anyhow, maybe I just misunderstood what you meant by ‘the fundament’ of data journalism. I still agree with the high necessity of accuracy. But in all kinds of journalism, not just data journalism. Otherwise, well done!

    Liked by 1 persoon

    • Thanks for the comment 🙂

      “I do question your conclusion of accuracy as the fundament of all data journalism. I agree that data journalism should lead to accurate reporting, but is it really the fundament of data journalism?”

      I agree that translating the data to a compelling story is an important aspect of data journalism, but I think the data being accurate is the most important component. After all, a good writer can still make a compelling story out of inaccurate data, a story that might resonate with people even though it has no evidence. The MMR vaccine controversy I described in this blog post is a good example of this. It was unfounded, and other research denied the link between autism and vaccines entirely, yet many papers reported it and it resonated with people, making them refuse vaccines for their children (with all the consequences that has). This was a compelling story for some people, because it played on the distrust towards the goverment and health institutions. Next to that, the vaccine truthers were the ‘rebels’ going against the medical ‘establishment’, which I think people may regard as a compelling narrative. And it is about people’s children, which is bound to get people’s attention. But that’s why I regard accuracy as the fundament of (good) data journalism. As Simon Rogers said in the online course, the visualisation do not matter that much per se, what matters is getting the accurate facts out. I do agree that it is related to journalism in general, but I consider accuracy as being essential for a good data journalist. Creating compelling stories is a close second though 😉

      Like

  2. I liked your post a lot. It was very interesting. The vaccines example was very useful, it shows perfectly the confirmation bias and the lack of verification that multiple media outlets have. The title was good. I also think you used the Data Journalism Course material in a nice way, very related to the main idea of the post. I am a journalist and I found your post very useful for instructions on how to do a good job and avoid that things go wrong.

    Like

  3. As fitria91 stated above, I like your writing style as well, including the continuity in your blogs!

    You’ve said in your blog: ‘the fewer numbers you can use to tell the story, the better it is’. But is it in all cases? It makes it more readable, but I’m assuming that reducing the numbers of data can lead to misunderstandings as ell. Don’t you think it depends on the story you want to tell and it depends who your readers are?

    In one way, you’ve made a distinction between ‘normal’ journalism and ‘data’ journalism, however, you had never underpin the differences between those two? Is it not the case that the ‘normal’ journalism has already turned into ‘data’ journalism? With this I mean that the data that is available at the moment ensures that every journalist is a data journalist nowadays.

    I like your reproduction of the elements of the process that could go wrong. However, maybe you could end with a short summary at the end, instead of ending with the ‘confirmation bias’. But anyway! GOOD JOB!

    Like

    • “‘the fewer numbers you can use to tell the story, the better it is’. But is it in all cases? It makes it more readable, but I’m assuming that reducing the numbers of data can lead to misunderstandings as ell. Don’t you think it depends on the story you want to tell and it depends who your readers are?”
      Indeed it depends on the story, and that’s something I should have clarified in my blog post. Or I should have noted it as ‘with as few numbers as possible without hampering the facts’. Though for a lot of readers a compelling story is more interesting than the numbers, it is nevertheless of great importance to show the essential numbers.
      “Is it not the case that the ‘normal’ journalism has already turned into ‘data’ journalism?”
      That is a good point. I agree that the line between journalism and data journalism is becoming very blurry, at times being non-existent. I guess the difference between journalism and data-journalism is that data-journalism focuses exclusively on data and the verification of this data, and journalism also constitutes of interviews, live reporting and such. But both inform each other a lot I would say.

      Liked by 1 persoon

  4. You have written a very clear story, great job! Only there is one thing that I did not really understand. You were writing about how to turn data into stories and then you said: “The fewer numbers you can use to tell the story, the better.” Did you find this anywhere, or is this your own opinion? Because, I thought that the more confirmation you can get for your story, the better? What are you talking about exactly, a data set itself or the outcomes of an analysis of data? I would think that either way, the bigger your dataset (or the more outcomes you collected) the more valid your story is? Apart from that, I do not have any remarks on your post. It is a very neatly written piece.

    Like

    • “Only there is one thing that I did not really understand. You were writing about how to turn data into stories and then you said: “The fewer numbers you can use to tell the story, the better.” Did you find this anywhere, or is this your own opinion? Because, I thought that the more confirmation you can get for your story, the better? What are you talking about exactly, a data set itself or the outcomes of an analysis of data? I would think that either way, the bigger your dataset (or the more outcomes you collected) the more valid your story is?”
      I found that sentence in the Online Data Journalism course, in the first module I believe. What I meant is that it is more compelling for a story if the numbers are woven into the story instead of just presenting the number. It wasn’t referring to the data, as more data can indeed lead to a more valid story. It referred to the fact that the more you can translate the data into a compelling story, the less data you need to show. Of course, all the essential data needs to be visualized and displayed nevertheless, but the less you intimidate your readers with (the amount of) numbers and figures, the better. At least, that is my interpretation.

      Like

Geef een reactie

Vul je gegevens in of klik op een icoon om in te loggen.

WordPress.com logo

Je reageert onder je WordPress.com account. Log uit / Bijwerken )

Twitter-afbeelding

Je reageert onder je Twitter account. Log uit / Bijwerken )

Facebook foto

Je reageert onder je Facebook account. Log uit / Bijwerken )

Google+ photo

Je reageert onder je Google+ account. Log uit / Bijwerken )

Verbinden met %s