In the CollegeConfidential discussion of my blog post The Difficulty With Data, CC poster mihcal1 made the following compelling comment:
So basically, it’s a perfect setup for the Illusion of Validity
Why is MIT’s admissions process better than random? Say you weeded out the un-qualified (the fewer-than-half of applicants insufficiently prepared to do the work at MIT) and then threw dice to stochastically select among the remaining candidates. Would this produce a lesser class?
The link in mihcal1’s post takes you to an article from New York Times magazine by Daniel Kahneman. Kahneman is a pioneer of behavioral economics and the psychology of decision making. He is one of my favorite social scientists, and his work laid the foundation for much of the social science research I love best.
In his article, Kahneman describes his time working as a psychologist for the Israeli Army. They were tasked, among other things, with putting officer candidates through a series of challenges (an application, as it were) to test their leadership potential. They would watch the candidates as they completed challenges, and then they would predict how well they would succeed at officer candidate school.
According to Kahneman:
…as it turned out, despite our certainty about the potential of individual candidates, our forecasts were largely useless. The evidence was overwhelming. Every few months we had a feedback session in which we could compare our evaluations of future cadets with the judgments of their commanders at the officer-training school. The story was always the same: our ability to predict performance at the school was negligible. Our forecasts were better than blind guesses, but not by much.
I thought that what was happening to us was remarkable. The statistical evidence of our failure should have shaken our confidence in our judgments of particular candidates, but it did not. It should also have caused us to moderate our predictions, but it did not. We knew as a general fact that our predictions were little better than random guesses, but we continued to feel and act as if each particular prediction was valid. I was reminded of visual illusions, which remain compelling even when you know that what you see is false. I was so struck by the analogy that I coined a term for our experience: the illusion of validity.
Why, asked mihcal1, were we, as admissions officers, so sure that we were right in our decisions? What made us think our decisions would be better than random guesses? And how can we know?
This is a very good question to ask, and a very difficult one to answer.
Part of the reason it is so difficult to answer is because of the problems I discussed in the last post, which is basically: well, what makes our decisions “better”? How do we know if one applicant is “better” than the other? What does “better” even mean? We could cherrypick any number of metrics that would make the case in our favor. For example, over the last decade or so, our average applicant SAT score has gone up, and our average rate of admission has gone down. You might intepret this to say that we are admitting smarter students, and that we are doing a good job of recruiting applications too, so hey, we’re all going a pretty good job!
Of course, I think those are terrible metrics by which to measure an applicant or an admissions process. What matters isn’t raw SAT score, or how many people we can convince to apply. What matters is making sure that we bring smart students who feel at home here. Who love the community they are in. Who believe in the things that we do here at MIT and who will go out and change the world to be a better place.
As it turns out those things are much, much harder to measure.
Does this mean that our process is no better than random? That all we are doing is admissions shamanism, voodooing behind closed doors of admissions committee before coming out into the light and announcing the signs we’ve read in the application’s entrails?
I don’t think so, for a few different reasons.
One reason is to remember a fundamental limitation of social science, which is that it is situation dependent, and thus it is most usefully and reliably deployed for falsifying specific hypotheses rather than drawing conclusions across contexts.
For example, Kahneman cites research into decades of data which demonstrate that most stock pickers and fund managers basically do no better than random guessing would predict. This sort of question is right in the social science wheelhouse. Hypothesis: variance in skill explains differences in performance between investment managers. Test: do stock pickers routinely perform better than random chance would predict? Result: mostly, no. Hypothesis false, or at least seriously weakened.
But it’s not clear that an admissions process is anything like picking stocks, so it’s also not clear that the same phenomena can be generalized to the work we do. Trying to carry such a slippery situational insight across different contexts is an intellectually dubious exercise.
Another problem with the Army example I alluded to earlier: what’s to say that the psychologists weren’t “better” at picking officers than their future commanders? What does “better” in this context even mean? Without measuring the judgments of the commanders, how could we know? And how would we measure it?
Clearly Kahneman thinks that some people (Israeli Armi commanders) are better at picking some things (future officers) than other people (inexperienced psychologists). And this assumption actually reveals a pretty interesting premise: that there are some real experts. So let’s approach this from another angle: what conditions, according to Kahneman, might make you think that an expert is actually an expert? That a professional is actually good at their job, and not merely reproducing the random and taking credit for it?
True intuitive expertise is learned from prolonged experience with good feedback on mistakes. You are probably an expert in guessing your spouse’s mood from one word on the telephone; chess players find a strong move in a single glance at a complex position; and true legends of instant diagnoses are common among physicians. To know whether you can trust a particular intuitive judgment, there are two questions you should ask: Is the environment in which the judgment is made sufficiently regular to enable predictions from the available evidence? The answer is yes for diagnosticians, no for stock pickers. Do the professionals have an adequate opportunity to learn the cues and the regularities? The answer here depends on the professionals’ experience and on the quality and speed with which they discover their mistakes…Many of the professionals we encounter easily pass both tests, and their off-the-cuff judgments deserve to be taken seriously.
In other words, if you have a lot of experience, and if you have good, quick feedback on mistakes, then your intuition is likely to be better than random chance.
This, I think, characterizes our admissions office. In any given admissions committee, decades and decades of admissions experience are directed towards examining a single applicant and all of the information – essays, interviews, letters of recommendation, awards from external experts – we have about them. In fact, I laughed a little at Kahneman’s reference to “true legends of instant diagnoses are common among physicians”, because McGreggor Crowley, who directs our admissions process, is a physician, and if there is anybody who is legendary for his ability to “diagnose” an applicant, it’s him.
And we have good, rapid feedback too. We meet most students we admit soon after at CPW. We then spend four (or more) years living with them. They work in our offices. We advise them academically. We become friends as the years go on. So we don’t just have feedback on our decisions. We quite literally live with them.
Finally, there is the point that David made in his last blog post, which is essentially that there are many types of admissions processes, and that it doesn’t matter whether they are “fair” as much as it matters that they “work”, which is to say that they produce the sort of community that you aspire to be a part of.
I think there is a lot of truth in that. Fundamentally an admissions process is measured not by what it is but by what it does, which is of course to constitute a community. That doesn’t mean we aren’t reflective or analytical about the way we do things: in fact, we employ two terrific statisticians within our office alone specifically to run the data and tell us how to do things better!
But it does mean that the only real standard which matters is whether the students, the faculty, and the rest of the world think that MIT students are awesome people who do awesome things, and that our students feel at home here. By this standard, I think our process does a very, very good job.
And that, my friends, is no illusion.