Do Recommendation Letters Actually Tell Us Anything Useful?
Recommendation letters are one of the most face valid predictors of academic and job performance; it is certainly intuitive that someone writing about someone else whom they know well should be able to provide an honest and objective assessment of that person’s capabilities. But despite their ubiquity, little research is available on the actual validity of recommendation letters in predicting academic and job performance. They look like they predict performance; but do they really?
There is certainly reason to be concerned. Of the small research literature available on recommendation letters, the results don’t look good. Selection of writers is biased; usually, we don’t ask people who hate us to write letters for us. Writers themselves are then biased; many won’t agree to write recommendation letters if the only letter they could write would be a weak one. Among those that do write letters, the personality of the letter-writer may play a more major role in the content than the ability level of the recommendee. So given all that, are they still worth considering?
In a recent issue of the International Journal of Selection and Assessment, Kuncel, Kochevar and Ones1 examine the predictive value of recommendation letters for college and graduate school admissions, both in terms of raw relationships with various outcomes of interest and incrementally beyond standardized test scores and GPA. The short answer: letters do weakly predict outcomes, but generally don’t add much beyond test scores and GPA. For graduate students, the outcome for which letters do add some incremental predictive value is degree attainment (which the researchers argue is a more motivation-oriented outcome than either test scores or GPA) – but even then, not by much.
Kuncel and colleagues came to this conclusion by conducting a meta-analysis of the existing literature on recommendation letters, which unfortunately was not terribly extensive. The largest number of studies appearing in any particular analysis was 16 – most analyses only summarized 5 or 6 studies. Thus the confidence intervals surrounding their estimates are likely quite wide, leaving a lot of uncertainty in the precise estimates they identified. That doesn’t necessarily threaten the validity of their conclusions – since these are certainly the best estimates of recommendation letter validity that are available right now – but it does highlight the somewhat desperate need for more research in this area.
Another caveat to these findings – the studies included in any meta-analysis must have reported enough information to obtain correlation estimates of the relationships of interest. In this case, that means the included studies needed to have quantified recommendation letter quality. I suspect many people reading recommendation letters instead interpret those letters holistically – for example, reading the entire letter and forming a general judgment about how strong it was. That holistic judgment is probably then combined with other holistic judgments to make an actual selection decision. Given what we know about statistical versus holistic combination (i.e., there is basically no good reason to use holistic combination), any particular incremental value gained by using recommendation letters may be lost in such very human, very flawed judgments.
So the conclusion? At the very least, it doesn’t look like using recommendation letters hurts the validity of selection. If you want to use such letters, you will likely get the most impact by coming up with a reasonable numerical scale (e.g. 1 to 10) and assign each letter you receive a value on your scale to indicate how strong the endorsement is. Then calculate the mean of that number alongside the other components of your statistically derived selection system (e.g. GPA and standardized test scores).
- Kuncel, N. R., Kochevar, R. J., & Ones, D. S. (2014). A meta-analysis of letters of recommendation in college and graduate admissions: Reasons for hope International Journal of Selection and Assessment, 22 (1), 101-107 : 10.1111/ijsa.12060 [↩]
Previous Post: | NSF Report Flawed; Americans Do Not Believe Astrology is Scientific |
Next Post: | When You Are Popular on Facebook, Strangers Think You’re Attractive |
The two big questions I’d have relative to this are: (1) what are the results like on job performance as opposed to academic performance and (2) do the extant studies account for what I would consider probably the most important factor with respect to academics—the status of the recommender within her/his field?
Obviously, (2) is going to be less of an issue at the admissions level, but in domains where letters play a significant role in hiring (like academia), it seems like it might be rather important. Off the top of my head, one could look at tenure attainment as an outcome with predictors like publication count, log citation count, years since degree, and possibly some measure of service as potential predictors.
In any case, thanks for the blog post. It does seem like an interesting question.
The relationship with actual job performance was my first question – unfortunately, that literature is even less developed than this one.
The role of individual differences in letter-writing is pretty massive – aside from expertise effects, there are even personality effects. The issue here is really one of reliability – on average, across 3 letters, do you end up tapping a reasonable amount of true score potential?
I don’t know that academic outcomes at the level you’re talking about would be all that useful. In many academic fields (including I/O Psych), most PhDs don’t even go into academia. Among those that do, you’re going to end up with severe range restriction. Perhaps a binary indicator of a job offer in the field of training within a year after graduation would be preferable?
I think it would be very difficult to quantify letters of recommendation. The post mentions the common selection biases; that means breaking it down to a counting issue is questionable. But there are other difficulties when you want to judge the quality of a letter somehow. For instance, not everyone is a talented writer and sometimes, what was supposed to be a great recommendation, just sounds like a mediocre reference. That’s why tools such as [link] have become so popular in recent years. And furthermore, how do you take into account the “quality” of the referee? Certainly there is a difference between the Junior Lecturer and the Dean of the Business School; and a reader will pick up on that. But how do you rate that in numbers?
Mr. Landers,
In your Article you mention the limited research available on recommendation letters. I have recently started to research the usefulness of recommendation letters in the selection process and have found few pier reviewed sources on the subject.
Would you happen to have any sources you could recommend on the subject?
Any assistance you can provide will be appreciated.
I’m afraid I don’t have a bibliography on this topic handy, sorry. But I would recommend starting with Judge and Higgins 1998 in OBHDP and spidering off from there (articles it cites and articles that have cited it).