Data journalism 2.0
Last night, thanks to a productive procrastination binge, I finally got around to watching this documentary about data visualizations in journalism. It’s nice to see data journalism getting some attention, but I also hope that the shiny side of the field doesn’t eclipse writing and analysis.
Even the best visualizations can’t represent more than two or three variables. To analyze most complex problems, we need to discuss data that we can’t imagine in 3D. One of the things I’m trying to wrap my head around is how to bring real regression analysis, or at least some knowledge of statistical significance, into journalism. If we really want to use the avalanche of information that is hitting the web, we need to communicate better about statistics and statistical methods.
Interactive visualizations are supposed to replace old-fashioned, static infographics. Today, most writing about statistics is equivalent to the bar charts of old. You know: This number is bigger than that number; some other indicator is going up. Let’s take for granted that the difference isn’t random. Don’t worry your pretty little heads about selection bias. You don’t need to know how or why this survey was conducted. Let’s imply that any two studies with competing conclusions cancel each other out. And on and on and on. It’s time for a new model.
(h/t to kottke)