There is an art to data journalism, and in many cases that art requires an involved and arduous process. In a recent interview, Simon Rogers, editor of the Guardian’s Datablog and Datastore, discussed many of the issues his team faced when they assembled databases and reports from the WikiLeaks releases. More recently, journalists have been building scads of interactive maps to illustrate news from the disaster in Japan and the political situation in Libya.
A recent story at Poynter looking at the importance of such maps also briefly noted their return on investment:
“The data-driven interactives take a lot of time and teamwork to produce, but they have the greatest value and generate good traffic and time-spent on the site,” said Juan Thomassie, senior interactive developer at USA Today.
So, hard work yields strong engagement. Sounds good. But
that same Poynter article included this eye-opening aside: the New York Times has four cartographers. On first blush, my editor cringed at the (seemingly) exceptional number of hours and resources the Times is dedicating to map production. Does a news org really need four cartographers? I turned to Pete Warden, founder of OpenHeatMap, for some informed answers.
Warden walked me through the labor-intensive process — one that may very well justify a full cartography team [Ed. duly noted]. He also discussed a few tools that can streamline data journalism production.
Our interview follows.
What are the steps involved in making an interactive map?
Pete Warden: Usually one of the hardest parts is gathering the data. A good example might be the map Alasdair Allan, Gemma Hobson, and I did for the Guardian (see the screen shot below; find the dataset here).
Alasdair spotted that the Japanese government had released some data on the radiation levels around the country. Unfortunately, it was only available in PDF forms, so Gemma and I did a combination of cutting-and-pasting and manual typing to get all the readings and locations into a spreadsheet. Once they were in a spreadsheet, we then had to pick exactly what we wanted to display in the final map.
Alasdair took charge of that process and spent a lot of time trying out different scales and units — for example, showing the difference between the current values and the background levels at each location since some areas had naturally higher levels of radiation. That involved understanding what the story was we wanted to tell — similar to the way reporters put together quotes and other evidence to support the points of their articles. It also meant repeatedly uploading different versions and iterating until there was something that looked interesting and informative.
Click here to visit the Guardian’s original story.
Are there tools that can make the data acquisition and mapping processes more efficient?
Pete Warden: I’m obviously a big fan of OpenHeatMap, but I’ve also been very impressed by both Google’s Fusion Tables and the Tableau Public tool. This gives users a lot of choices. My design bias is toward simplicity, so OpenHeatMap’s audience includes users unfamiliar with traditional GIS.
You recently released the Data Science Toolkit. How can the open source tools in that kit be applied to data journalism?
Pete Warden: The toolkit contains a lot of tools based on common requests from journalists. In particular, the command-line tools, like street2coordinates and coordinates2politics, can be very handy for taking large spreadsheets of addresses and calculating their positions, along with information like which congressional districts, neighborhoods, cities, states and countries they are in. You can then take that data and do further processing to break down your statistics by those categories.