# Economic Woman

Econometrics, gender, equity and more.

## Data journalism 2.0

Last night, thanks to a productive procrastination binge, I finally got around to watching this documentary about data visualizations in journalism. It’s nice to see data journalism getting some attention, but I also hope that the shiny side of the field doesn’t eclipse writing and analysis.

Even the best visualizations can’t represent more than two or three variables. To analyze most complex problems, we need to discuss data that we can’t imagine in 3D. One of the things I’m trying to wrap my head around is how to bring real regression analysis, or at least some knowledge of statistical significance, into journalism. If we really want to use the avalanche of information that is hitting the web, we need to communicate better about statistics and statistical methods.

Interactive visualizations are supposed to replace old-fashioned, static infographics. Today, most writing about statistics is equivalent to the bar charts of old. You know: This number is bigger than that number; some other indicator is going up. Let’s take for granted that the difference isn’t random. Don’t worry your pretty little heads about selection bias. You don’t need to know how or why this survey was conducted. Let’s imply that any two studies with competing conclusions cancel each other out. And on and on and on. It’s time for a new model.

(h/t to kottke)

Written by Allison

11 October 2010 at 12:30 pm

Posted in Uncategorized

### 4 Responses

1. How to visualize regression analysis: draw two scatter plots of 10 data points each. The first one is a tighter relationship than the other. Then draw a line of best fit through each plot. How do you know you have a line of best fit? Draw a vertical line from each data point to your line of best fit. These are vectors, each having a finite length, and they represent the error term. Regression finds the line of best fit such that the sum total of these vectors is minimized. If you add up the vectors from the tighter group, you will have a smaller number than the looser group of data. This number is the sum of squares.

abraaten

12 October 2010 at 10:42 am

2. Of course. But these kinds of two-variable scatter plots can’t communicate anything like the richness of a multiple regression table.

Allison

12 October 2010 at 11:03 am

3. Why not go with spider plots, then?

abraaten

25 October 2010 at 8:01 pm

4. Multivariate is necessary in order for people to understand confounding: how taking something else into account can change things in observational studies. Checkout Schield (2006): “Presenting Confounding Graphically Using Standardization”

Milo Schield

6 December 2010 at 9:13 am