The Lies That Data Tell
I’ll go ahead and admit it – I love general-audience articles by Frank Schmidt. If you aren’t familiar with him, he is one of the masterminds behind Hunter-Schmidt psychometric meta-analysis, one of the most widely adapted methods to summarize research results across studies in OBHRM and psychology. I love his general-audience articles because virtually all of them use straightforward examples of how data in the wrong hands can mislead. Frankly, they remind me of a more technical version of Ben Goldacre’s fantastic blog, Bad Science.
His most recent piece, “Detecting and Correcting the Lies That Data Tell” (the full text of which is available here)1, is no exception. Appearing in Psychological Science, it details an example of where statistical significance testing can lead you astray. In his example, he presents 20 correlations ranging from .02 to .39 found in the research literature comparing decision making test scores and supervisory ratings of job performance. Half are statistically significant and half are not. From this, he claims there are three typical conclusions:
- The Moderator Interpretation
There is no relationship in half of the organizations, but there is a relationship in the other half. - The Majority Rule
Because half are significant and half are not, err on the side of caution and conclude there is no relationship between the two (if one side had a majority of “significances,” go with that). - Face-Value
Accept the correlations as they are – there is variation but the relationship is positive.
Of these three, #1 and #2 are the most common, despite the fact that all three are incorrect (at least in this case).
To show how this could be, Schmidt produces a theoretical distribution of sample-level statistics as one might extract from those 20 values. Here, he finds a mean of r = .20 with SD = .08. But this is not the whole story – that .08 actually represents two pieces of information combined:
- Actual variation in population correlations
- Variation due to sampling error, the tendency for sample statistics to vary randomly (and predictably) around a population value, and to do so more substantially when sample sizes are small
After applying statistical corrections, it becomes clear that the observed SD is biased heavily by sampling error. The true SD is only .01, meaning that variability in test scores accounts for just .01% of the variability in supervisory ratings! That’s a very consistent test! The variation observed in those initial 20 correlations was due to nothing more than random chance – luck. A science should not be built on luck.
Schmidt goes on to demonstrate that the observed correlation (.2) is also downward biased due to measurement error, i.e. unreliability. Correlations are predictably biased downwards when imperfect measurement is used.
Consider this example: two supervisors rate the job performance level of a single employee, and their ratings don’t agree. Why not? The reasons are myriad: they observe different aspects of the employee’s job performance, they have personal characteristics (agreeableness, for example) that vary, they have differing interpersonal relationships with their employees, etc. But does that mean that the actual job performance level of the employee is different for each supervisor? Probably not – the employee is performing at a specific level, and the two supervisors are simply interpreting this performance level differently. This is criterion-related unreliability.
The problem is that just because the supervisors can’t agree as to the performance level of an employee doesn’t mean that the employee’s job performance level is not consistent. When this artificial inconsistency is introduced, correlations based upon those unreliable scores will be biased downward, producing misleading results (lies!). But since unreliability affects correlations in a predictable fashion, we can apply statistical corrections to compute the real correlation – the correlation we would have found if we measured actual job performance rather than supervisors’ views of it. This corrected correlation was .32.
That’s substantial! We’ve found that this decision making test predicts 6.24% (.32 * .32 – .20 * .20) more variance in performance than we originally thought – that’s 256% our original amount!
Schmidt thus demonstrates the importance of a complete understanding of statistical significance testing to advancing organizational science – not only from an academic perspective, but from an applied perspective as well. It is quite easy to mislead yourself when constraining your research to a single organization – much care must be taken in interpretation, lest poor organizational decisions be made.
- Schmidt, F. (2010). Detecting and correcting the lies that data tell. Perspectives on Psychological Science, 5, 233-242. DOI: 10.1177/1745691610369339 [↩]
Previous Post: | Predicting Dropout Rates for Students Completing Online Surveys |
Next Post: | Deconstructing the News Media to Learn Statistics |
Trackbacks and Pingbacks