A Plain-English Explanation of Correlation
It came to my attention recently (primarily through my wife’s hunting for online resources on the topic) that there are no (or at least few) easy-to-understand plain-English explanations of what a correlation is or how to appropriately interpret it. Since percentages (which are fairly straightforward) and correlation (which is not) seem to have become the de facto statistics-to-be-reported by news agencies, this seems like a hole that needs to be plugged.
After all, the first Google result to “What is a correlation?” should not have formulas on it! If you can understand formulas, you probably wouldn’t be Googling “What is a correlation?” to begin with!
So here you go – my explanation of correlation as I would explain it to my undergraduate classes.
Correlation is an index of linear relationship between variables. Each of these concepts requires a little explanation.
Variable is simply the scientific way to refer to a set of measured numbers. For example, if I went to a classroom and measured the height of every person in that class, height would be a variable, and all of my measurements would be values contained within that variable. We call them variables because the values they contain vary – they can be any number of numbers gathered for any particular measurement, although outside forces usually constrain them to particular ranges.
An index is called an index because is not direct measure of anything, which is one of the reasons most people find them so confusing. You see a correlation of 0.5 and think “Well, 0.5 of what?” The answer is – “of nothing.” Indexes (including correlation) have no unit of measurement. For comparison, length is not usually measured with an index because its units are meaningful (e.g. 0.5 inches).
Covariance is actually the fancy measurable way to describe “how much are two variables related.” For example, imagine you have two paired variables:
X = 1, 2, 3
Y = 4, 5, 6
If you imagine those data points as pairs (e.g. [1,1], [2,4], [3,6]), you’ll see a clear pattern. When the first number goes up, the second number goes up predictably, which means we have a perfect relationship. If you graph those points, you’ll also notice that they form a straight line – this is what we call a linear relationship. Now replace Y with a second set of data:
Y = 4, 5, 7
If you imagine this new set of Ys with the previous set of Xs, you’ll still see a pattern, but it’s not quite as predictable as it used to be. You could still draw a line through the middle of your data, but it wouldn’t quite touch every point. It still a strong positive (+) relationship, but it is no longer perfect.
The more predictable one set of numbers is from another, the higher the covariance will be. The problem is that the numbers themselves are still meaningful. For example, consider these two sets of two variables:
X1 = 1, 2, 3
Y1 = 4, 5, 6
X2 = 4, 5, 6
Y2 = 7, 8, 9
The strength of these two relationships is obvious – both are perfect relationships. However, because the numbers themselves are larger in the second dataset, the covariance will also be larger. That’s a problem. We don’t want to know what is the raw amount of predictability between the two datasets. We want to be able to compare them directly.
Thus was born the need for correlation – a standardized covariance. Instead of using covariance, which gets bigger with bigger numbers, statisticians decided to create a scale so that the relationship between any pairs of data, no matter what their original units were, could be compared directly.
Thus, correlation was anchored at two numbers to indicate the magnitude of the relationship: 0 and 1. 1 represents the perfect relationship described above. 0 represents the total absence of a relationship.
But what about situations where one piece of data goes in reverse? Consider these variables:
X = 1, 2, 3
Y = 6, 5, 4
This relationship is still perfect, but the direction between the two is reversed. This is called a negative (-) relationship. Positive and negative thus indicate the direction of the linear relationship, while the magnitude represents the strength of the linear relationship.
A correlation is thus the combination of an indicator of direction (- or +) and a number representing the relationship’s magnitude (0 to 1). Overall then, it appears as if correlations range from -1 to +1. But it’s important to remember that a -1 is just as strong (and just as perfect!) a relationship as a +1.
Now that you have all the background knowledge, try out that Flash program above to play with correlations!
Keep N=200 and simulate the following correlations in this order to see what they look like:
You should notice that at 0, the data just looks like a big shapeless cloud while at +1, it’s a very straight line. When you change to negative correlations, that’s still true, but the line changes direction. Move the sliders back and forth to get a feel for what correlations tend to look like.
Another thing you should notice: correlation is simply used to describe data. If a correlation is found between two variables, that doesn’t tell you anything about whether or not one variable caused the other. The golden rule of correlation is simple and even sort of rhymes: correlation is not causation. Causation can only be proven through carefully designed experiments – statistics alone do not have the power to prove anything caused anything else.
If you’d like more information on the interactive correlation simulator above, please check out this additional information on my professional webpage. If you are using this page as part of a lesson in your classroom, I’d appreciate a comment letting me know who you are, what institution you work for, and what course you are using it in. The tool above can also be used to teach a little bit about sampling error!
And finally, if you are an educator or represent an organization interested in partnering with a major research university to use technology-enhanced hiring or training techniques like this tool, please leave a comment or visit my laboratory (the webpage of which may still be under construction!).
|Previous Post:||College Courses for $99/month?|
|Next Post:||We’re One of the Top 100|