Thoughts of a Neo-Academic has been named one of the “100 best blogs and websites for innovative academics” by accreditedonlineuniversities.com, which appears to be part of elearners.com. I don’t really know the reputation of elearners.com, so I’m not sure if that’s good or bad – you always have to wonder if such things are just marketing stunts. This is the description of this site that they provide:
The “neo-academic” in question is a young soon-to-be professor of Psychology. He talks at length about technology affecting psychology, the nature and meaning of the GRE, the future of social media, and more.
That’s… mostly right! I am already a professor, those topics are a bit specific, and “young” is a relative term (although I’ve often been told that the key to advancing my career is to dye my hair gray as soon as possible!).
[kml_flashembed publishmethod=”dynamic” fversion=”9.0.0″ movie=”http://rlanders.net/correlation_simulator.swf” width=”550″ height=”300″ targetclass=”flashmovie”]
[/kml_flashembed]
It came to my attention recently (primarily through my wife’s hunting for online resources on the topic) that there are no (or at least few) easy-to-understand plain-English explanations of what a correlation is or how to appropriately interpret it. Since percentages (which are fairly straightforward) and correlation (which is not) seem to have become the de facto statistics-to-be-reported by news agencies, this seems like a hole that needs to be plugged.
After all, the first Google result to “What is a correlation?” should not have formulas on it! If you can understand formulas, you probably wouldn’t be Googling “What is a correlation?” to begin with!
So here you go – my explanation of correlation as I would explain it to my undergraduate classes.
Correlation is an index of linear relationship between variables. Each of these concepts requires a little explanation.
Variable is simply the scientific way to refer to a set of measured numbers. For example, if I went to a classroom and measured the height of every person in that class, height would be a variable, and all of my measurements would be values contained within that variable. We call them variables because the values they contain vary – they can be any number of numbers gathered for any particular measurement, although outside forces usually constrain them to particular ranges.
An index is called an index because is not direct measure of anything, which is one of the reasons most people find them so confusing. You see a correlation of 0.5 and think “Well, 0.5 of what?” The answer is – “of nothing.” Indexes (including correlation) have no unit of measurement. For comparison, length is not usually measured with an index because its units are meaningful (e.g. 0.5 inches).
Covariance is actually the fancy measurable way to describe “how much are two variables related.” For example, imagine you have two paired variables:
X = 1, 2, 3
Y = 4, 5, 6
If you imagine those data points as pairs (e.g. [1,1], [2,4], [3,6]), you’ll see a clear pattern. When the first number goes up, the second number goes up predictably, which means we have a perfect relationship. If you graph those points, you’ll also notice that they form a straight line – this is what we call a linear relationship. Now replace Y with a second set of data:
Y = 4, 5, 7
If you imagine this new set of Ys with the previous set of Xs, you’ll still see a pattern, but it’s not quite as predictable as it used to be. You could still draw a line through the middle of your data, but it wouldn’t quite touch every point. It still a strong positive (+) relationship, but it is no longer perfect.
The more predictable one set of numbers is from another, the higher the covariance will be. The problem is that the numbers themselves are still meaningful. For example, consider these two sets of two variables:
X1 = 1, 2, 3
Y1 = 4, 5, 6
X2 = 4, 5, 6
Y2 = 7, 8, 9
The strength of these two relationships is obvious – both are perfect relationships. However, because the numbers themselves are larger in the second dataset, the covariance will also be larger. That’s a problem. We don’t want to know what is the raw amount of predictability between the two datasets. We want to be able to compare them directly.
Thus was born the need for correlation – a standardized covariance. Instead of using covariance, which gets bigger with bigger numbers, statisticians decided to create a scale so that the relationship between any pairs of data, no matter what their original units were, could be compared directly.
Thus, correlation was anchored at two numbers to indicate the magnitude of the relationship: 0 and 1. 1 represents the perfect relationship described above. 0 represents the total absence of a relationship.
But what about situations where one piece of data goes in reverse? Consider these variables:
X = 1, 2, 3
Y = 6, 5, 4
This relationship is still perfect, but the direction between the two is reversed. This is called a negative (-) relationship. Positive and negative thus indicate the direction of the linear relationship, while the magnitude represents the strength of the linear relationship.
A correlation is thus the combination of an indicator of direction (- or +) and a number representing the relationship’s magnitude (0 to 1). Overall then, it appears as if correlations range from -1 to +1. But it’s important to remember that a -1 is just as strong (and just as perfect!) a relationship as a +1.
Now that you have all the background knowledge, try out that Flash program above to play with correlations!
Keep N=200 and simulate the following correlations in this order to see what they look like:
- +1
- +.95
- +.7
- +.3
- 0
- -.3
- -.7
- -1
You should notice that at 0, the data just looks like a big shapeless cloud while at +1, it’s a very straight line. When you change to negative correlations, that’s still true, but the line changes direction. Move the sliders back and forth to get a feel for what correlations tend to look like.
Another thing you should notice: correlation is simply used to describe data. If a correlation is found between two variables, that doesn’t tell you anything about whether or not one variable caused the other. The golden rule of correlation is simple and even sort of rhymes: correlation is not causation. Causation can only be proven through carefully designed experiments – statistics alone do not have the power to prove anything caused anything else.
If you’d like more information on the interactive correlation simulator above, please check out this additional information on my professional webpage. If you are using this page as part of a lesson in your classroom, I’d appreciate a comment letting me know who you are, what institution you work for, and what course you are using it in. The tool above can also be used to teach a little bit about sampling error!
And finally, if you are an educator or represent an organization interested in partnering with a major research university to use technology-enhanced hiring or training techniques like this tool, please leave a comment or visit my laboratory (the webpage of which may still be under construction!).
A company called straighterline offers a $99/month subscription to its services: online college courses with no time constraints on course completion. The company allows you to participate in courses at your leisure, with the ability to finish them as fast as you can take them – one student completed four courses in two months (effectively, for $200, compared to anywhere from $2000 to $10000 over one or two semesters at a brick-and-mortar institution). Each student is assigned an adviser with a Ph.D. accessible via e-mail, and credits can be transferred to one of four partner colleges.
According to Inside Higher Ed, there were five partner colleges, at least as of March. I wonder what happened to the fifth. Currently, this is the list:
- Charter Oak State College, a 4-year public small liberal arts college (SLAC) in Connecticut focused on distance learning
- Fort Hays State University, a mid-sized public university in Kansas
- Lake City Community College, a public community/professional college in Florida
- Potomac College, a for-profit college in the Washington DC metro area
Of that list, as you might guess, the biggest problems have occurred at Fort Hays State. Fort Hays is in most regards a typical public university. As such, its students have a stronger opinion about the kind of cachet their degree will bring them:
“In the short term, this may save FHSU a small amount of money (although this is debatable). In the long term, this could increase the cost of a degree for current students, lower the quality of education and academic standards at FHSU, lead to unemployment for many passionate educators, and eventually cheapen the value of a degree from FHSU for both current and future alumni,” says the Facebook group created by students that has set off the discussion.
Perhaps the most interesting aspects of the debate (at least from the perspective of training research in I/O psychology) are the assumptions about the lower quality of online education, despite recent evidence from the Department of Education (as well as my own dissertation) that it is at least comparable and sometimes preferable terms of effectiveness. Consider this statement by a Fort Hays student:
If Straighter Line fails too many students or make courses too challenging, they run the risk of losing support from the schools that use their service. How do they maintain academic honesty in an entirely virtual class? How do they anticipate the needs of a wide variety of students if their courses are pre-designed and generic? Can anyone actually tell me (with a straight face) that virtual general education classes offer the same quality as face-to-face instruction from passionate educators on the FHSU campus? Why bother being a liberal arts institution if we are going to devalue general education courses?
Many of these concerns mirror those of trainers considering migration to online courses (usually because it’s easier and perceived as less expensive), which is why my interest is piqued.
For more of the debate, check out these articles and discussions:
- Washington Monthly: College for $99 a month
- Inside Higher Ed: Revolt Against Outsourced Courses
- Slashdot: All-You-Can-Eat-College for $99-a-month
- Kairosnews: The Perfect Storm Facing Higher Education
- Alex Reid: a straighterline to higher education hell
- Tony’s Brain: Straighterline Revisited
- SL-written press releases and promotional blog entires