Sep 1, 2005
Lies, Damned Lies, and Statistics
Posted in: Miscellaneous
I do understand that human beings are not intuitively good at statistics. Daniel Kahneman won the Nobel Prize in Economics in 2002 for showing, among other things, that people in general are terrible at statistical thinking. From Kahneman's Nobel biography:
The standard example of a framing problem, which was developed quite early, is the 'lives saved, lives lost' question, which offers a choice between two public-health programs proposed to deal with an epidemic that is threatening 600 lives: one program will save 200 lives, the other has a 1/3 chance of saving all 600 lives and a 2/3 chance of saving none. In this version, people prefer the program that will save 200 lives for sure. In the second version, one program will result in 400 deaths, the other has a 2/3 chance of 600 deaths and a 1/3 chance of no deaths. In this formulation most people prefer the gamble. If the same respondents are given the two problems on separate occasions, many give incompatible responses. When confronted with their inconsistency, people are quite embarrassed. They are also quite helpless to resolve the inconsistency.
Still, just because other people are bad at statistics doesn't mean I'll accept that as an excuse from you as a prospective MIT applicant. Most MIT majors require or suggest a course in probability and/or statistics, so you might as well get a head start in statistical thinking now.
First, a few facts on which to chew:
1. The overall admission rate for the class of 2009 was 14.3%. (From here.)
2. Applicants who interviewed (or had their interview waived) had a 19% admission rate; those who didn't interview had a 7% admission rate. (I don't have a citation for this, which is sketchy, so feel free not to believe me. But although I can't remember where I found the numbers, this is close enough to the truth for the purposes of this entry.)
3. Applicants with SAT scores in the 88th percentile (roughly a 1290 old SAT) have about a 5% admission rate, while those with perfect scores have about a 50% admission rate. (From here -- a very fun read, if you're into this kind of thing. I highly suggest it!)
So does this mean that you can pour all of your personal data into some magic admissions algorithm and have it spit out a number which reflects your chances of getting into MIT?
First of all, no. Moreover, it wouldn't matter if it could. For example, if the computer said that you had a 33% chance, that would mean that if you applied to MIT many times, you would expect to get in in approximately 1 in 3 tries. (And we're not talking "if you applied 3 times" here. I think applying 500 times would probably give a good result, but I don't feel like playing around with Matlab to see if that's true.) Of course, you can't apply 500 times to MIT in a single year, or even in your lifetime, so it's pointless to try and stick a number on your chances at MIT.
I guess the moral of the story here is that no one is a shoo-in for MIT, but the opposite is also true -- nobody should think they have no hope. But it's pointless to over-think this issue, because you just can't control for all the variables.
For what it's worth, my Super Getting into MIT Guide goes something like this:
1. Do something that you really care about, and make sure you write about it glowingly on your application.
2. Interview, and don't be lame and fake at said interview.
3. Get good scores on the SAT I and SAT IIs.
4. Take difficult classes at your high school (or even local community college) and get good grades in them.
And, of course, you can get into MIT if you only have three of these four characteristics... you can get in if you only have two... you can get in if you only have one. But even if you have four, you're not a sure thing.
My final statistics lesson has to do with something you may have heard -- that MIT supposedly has a stratospherically high suicide rate. This is a contention supported by the Boston Globe, a group of stellar journalists, I'm sure, but not so good at the statistics thing. (I can't find the original Globe article, but the article here makes all the points the original article made.) The Globe basically looked at the MIT suicide rate between 1990 and 1999, compared it to suicide rates at other schools, and decided it was too high. (Let's just say there's a reason the Globe article wasn't published in a scientific journal. Sweeping conclusions backed up by questionable data like that make scientists -- including me -- want to bang their heads on hard surfaces.)
Now let's look at some problems with the Globe's grandiose conclusions:
1. People who successfully commit suicide are significantly more likely to be young and male. In the 1990s, the average MIT student was both those things; since then, the population has famously evened out. (Source here; relevant quote: "In fact, MIT's suicide rate is below the national average if one adjusts figures for the school's overwhelmingly male student body [during the years of the study].")
2. Moreover, science, engineering, and business students have significantly higher suicide rates than do liberal arts students. MIT undergraduates are almost exclusively science, engineering, and/or business majors. Given that both those things are true, one would expect MIT to have a high suicide rate based on those demographics alone. (Source here; relevant quote: "Based on 10 undergraduate suicides over 11 years, the article concludes that suicide is a greater danger at MIT than elsewhere. When one factors in that science and business students have considerably higher suicide rates than liberal arts students, and that male college students kill themselves five times more often than female college students, the figures quoted prove nothing. MIT is cited as currently being composed of 59 percent male students; that fact alone would make the suicide rate differences with most other colleges understandable; but in the early 1990s an even higher percentage of the students at MIT were male.")
3. The Globe compared MIT to other schools with engineering programs, which is a terrible control. Other schools have engineering programs, yes, but few other schools have 50% of the undergraduate student body majoring in engineering. If you don't have appropriate controls (and it's difficult to think of a school which would be a good control -- Caltech is science/engineering focused too, but only having one school as the control population would be pretty sketchy.)
4. Statistics like this are terribly vulnerable to small swings in absolute numbers. The absolute number of suicides is very small, and therefore it takes many of them spread over many years to accurately determine whether or not the rate in one place is higher or lower than the rate in another. (Source here; quote: "Because of small number statistics, the "true" suicide rate -- i.e., that that would be measured by an very large MIT in the limit of an infinite number of students -- is, to 95% confidence, approximately 100,000*(11 +/- 2*sqrt(11)/48,000). At this level, MIT's suicide rate is consistent with the national average... it would take approximately another thirty three years in order to obtain a measurement of the MIT suicide rate that could be distinguished from the national average at 95% confidence.")
So now you know. Go out, and tell my story to the masses. ;)