Skip to content ↓
MIT staff blogger Chris Peterson SM '13

The Difficulty With Data by Chris Peterson SM '13

the shadow of an application

Over the last few weeks I’ve posted entries about diversity vs merit and the holistic admissions process. And while I hope that these entries have contributed some insight into how and why we do the things we do, one complaint in the comments on those entries was about a lack of data to accompany and support the claims I had made. As one commenter put it:

MIT should release the full set of admissions data stripped of personally identifying information and let the community analyze it, because in the scientific community we trust data and analysis, not assertions.

 

So let’s discuss admissions data.

First, I’d like to say that I’m a huge fan of statistics. I read 538 and Football Outsiders every day. When it comes to baseball I’m a converted sabremetrician. In the natural world, I believe in the scientific method, which is to say I believe in data-driven analyses of phenomena, empirical evidence, and testable hypothesis as the best, and sometimes only, route to understanding most things which occur in our universe.

But there is a problem with social science, and that problem is this: sometimes, you don’t have all of the data, either because it is unavailable to you, or because something can’t be captured. And then, if you try to build a model based on these incomplete data, you are liable to draw conclusions consistent with the data but descriptively incorrect.

At its most basic form, it’s a variant of post hoc ergo propter hoc – “after this, therefore because of this.” The rooster crows, then the sun rises; all hail the befeathered Sun King! In more complex forms, it’s a very subtle misattribution of traits based on the ontologies used to characterize them, which begets an epistomelogical crisis: what do we measure and how do we measure it? Is the trait thus measured determinative or merely descriptive? And so forth.

But let’s back away from the analytical theory for a moment and ground what I’m saying in some concrete examples.

Here’s another comment from my diversity vs merit post about SAT scores:

This pretty much sums it up:

SAT Math
750-800 15%
700-740 10%
650-690 5%

From what you wrote you’d think being in the 700-740 range and being in the 750-800 range doesn’t have much impact on your chance of admission, but there’s a 50% difference.

Now, I and others are on the record as saying that we admit people, not test scores, and that in any case there is really not a difference in our process between someone who scores, say, a 740 on the SAT math, and someone who scores an 800 on the SAT math. So why, as the commentor asks, is there such a difference in the admit rate? Aha! Clearly we DO prefer higher SAT scores!

Well no, we don’t. What we prefer are things which may coincide with higher SAT scores. For example, a student who receives a gold medal at the IMO is probably more likely to score an 800 on the math SAT than a 740. But if we take an IMO medalist (with an 800) over random applicant X (with a 740), does that mean we preferred an 800 to a 740? No. It means we preferred the IMO medalist, who also happened to get an 800!

The same goes for people who are highly ranked in their graduating class. Almost half of the class of 2015 were valedictorians of their high school. Aha! MIT must highly value class rank in our application! No, we don’t. Then why does this happen? Because we do highly value certain academic accomplishments, and if you are doing well enough academically to achieve these things, then you are probably doing pretty well in high school. Additionally, we highly value strong letters of recommendation, and often teachers strongly support students who really blow them away academically.

So we select for these other traits and end up, as a side effect, with a disproportionate number of valedictorians. But it’s not because they’re valedictorians that we select them, but rather that because of the things for which we select they are valedictorians. Or, to paraphrase a line from Llewellyn: being a valedictorian isn’t the reason for the decision; it’s the result of factors which were reason for the decision.

You see what happens here. It’s correlation misdiagnosed as causation, and then interpreted through a particular narrative frame to conform (and confirm) to prior expectations. This happens all the time in shoddy social science. And it inevitably occurs with whatever data we do release. If we released admit rate by state, it would be: The admit rate for students from Wisconsin went up 2%, MIT must really want applicants from Wisconsin! When the reality would be much closer to: we took whom we wanted to take, and they were from Wisconsin. Was Wisconsin considered in a complex ecology of decisionmaking? To some degree, yes; that’s what we mean when we say we “read everything” and have a contextual, holistic process. But was it a determinative characteristic, one which could be separated out as a causal agent? Could Wisconsin be assigned a standard weight in a model of our decision process? Absolutely not.

What’s happening here is a fundamental confusion between our admissions process and the results of that process. When we say that the admit rate for students with a 750-800 was 15%, it does not mean that the chances of a given applicant who scores between 750-800 if 15%. It means that those students whom we chose to admit included 15% of those who scored within the 750-800 range. It’s a subtle distinction, but an important one in understanding the agency of admissions.

Think of it as the difference between a living thing and its fossil. A fossil isn’t the plant or animal itself: it’s the mineral imprint of the stuff that’s left behind. Or think of it like a shadow. A shadow is not the thing which casts a shadow. It’s the contours of where the light isn’t.

That’s how our admissions data work. It shows you where the decision wasn’t. It shows you the shape of our decisions, not the basis on which they were made. Admissions data are an accretion of the the sediment which dropped to the bottom of the decisions delta, and not the moving river where the actual action happened.

But Jurassic Park was a work of fiction, and just like you can’t reanimate a velociraptor from its fossil, you can’t understand the life of an applicant from the shadow of their data. This is why I hate “chance” threads so much. When an applicant says “I have X SAT score and Y GPA, what are my chances to get into MIT” it’s not a question I or anyone else can answer. Because, within certain bounds of sufficient academic preparation, the decision isn’t made on these easily extracted and quantified points of data. The decision is about everything else.

The response to this, of course, is “well, so release the data on everything else!” To which I ask: how? How can we meaningfully quantify how much a teacher supports a student? How can we meaningfully quantify that particularly poignant essay which shows a student’s resolve, or that particularly funny essay that makes us love their personality? Even if we did construct, ex nihilo, categorical cubbies to shove these interactions and experiences into, isn’t that the same subjectivity wearing an objective mask? I don’t think that “Rate this applicant’s leadership from 1 to 5” is a particularly objective exercise just because we slapped a number in it. Trying to convert inherently subjective interpretations to objective quantities is like wearing fashionable glasses of an incorrect prescription: it may look hip, but all it ultimately does is cloud your vision.

I understand that for the initial commentor and others this may be an unsatisfying explanation. MIT is a community which loves data, where people believe data can do anything, and where any explanation which undercuts the utility of data seems suspiciously unscientific.

But Clay Shirky once gave a talk about how memes – jokes, YouTube videos, lolcats, whatever – spread through the Internet, and he said something to the effect that the physics of memes were more like the physics of weather than the physics of a falling object. We understand how things fall pretty well, and we can be pretty accurate in our understanding of when and where and how fast it will drop. But even though we have reams and reams of data about the weather, because of its utter complexity the best way to characterize what will happen the next day is often no better than “partially cloudy with a chance of rain.”

Well, the physics of holistic admissions are akin to Shirky’s idea of “social weather.” Based on easily apprehendable information, you might know roughly what the temperature (of an applicant) will be, and hazard a guess as to whether it will rain. But until all of the ingredients mix together in our admissions committees, like a storm forming over the gulf, you don’t know upon whom a ray of sun will break through the clouds until it actually, finally happens.

21 responses to “The Difficulty With Data”

  1. I love your fashionable glasses and weather metaphors. smile

  2. Salute. You gain all the respect that I have to offer for working hard on such an excellent post. I am now sure that my faith in your process is not misplaced! This should deal a good blow to all the wolves who cried that the grapes were sour!

  3. jvl says:

    Thanks for another great post Chris. Loved the last metaphors.

  4. m_quinn says:

    @Chris P

    Hey Chris, I’ve got one: if it looks like a duck, walks like a duck, and quacks like a duck then it must be a duck … remember that? If MIT admits Asians at rate 5x their population in the U.S., and admits African Americans at about 50% their population in the U.S, then MIT must be employing racial preference in admissions decisions.

    I’m not buying your excuse for suppressing SES statistics. What are you hiding????? And, by the way, why won’t you post the names of those states which are “not represented” in the class of 2015? Again, what are you hiding????

    m_quinn

  5. Hesham '15 says:

    Brilliant. Right alongside Ben’s “It’s More Than a Job,” which is something I don’t say lightly. Hope this silences yet another generation of haters.
    Anyone who doesn’t believe Chris should come visit MIT: see what kinds of people study here. MIT students are way beyond GPAs and SAT scores. These measures are not enough.
    Best of luck to this year’s applicants. Be yourselves and focus on achievements that mean the most to YOU. Part 2 of the MIT app gives you plenty of room to do this. Enjoy it.

  6. M. S., The Champion of Galaxies says:

    m_quinn makes me giggle. Tee hee!

  7. Kun Cao says:

    very well said

  8. Lawrence Graceful Danger says:

    Dearest m_quinn,

    Race shouldn’t be confused for culture or traditional stereotypes. I just had a discussion in class the other day on affirmative action and the most common flaw in reasoning was that race necessitates a certain type of behavior. In this case, being accepted or rejected to MIT.
    A large proportion of the asian population applies and is qualified for MIT compared to blacks. This has nothing to do with race, although it may have something to do with culture. You’re assuming that the entire American population of Asians and African Americans are all applying to MIT with comparable statistics and that Asians are being accepted at a higher rate.
    The excessive usage of question marks is a signature of the troll.

  9. Covi says:

    I recommend Gladwell’s Blink and the Outlier. Everything about how statistics lie (or how people are fooled by statistics) can be found in his books.

  10. Chris Peterson SM '13 says:

    @valart –

    I heartily agree two times is a good number if only because you will be more familiar with the test. Sometimes, students feel they can score materially better a third time, and if that is the case, and if cost is not a serious object, I’d recommend it. I’d probably recommend doing two junior year and one senior year, if only because you have the opportunity to incorporate things you’ve picked up during senior year and had more time to learn generally.

  11. Ryan says:

    Dear Chris,

    I have recently completed my application and am very excited to hear about the results in December. However, I am slightly worried about my test scores. You and others in the admissions department regularly state something to the effect of “as long as the scores demonstrate the the person can do the work, we can move on to evaluating their other features” I received a 690 on the Math section of the SAT, a 750 on SAT Math II, and a 34 on the ACT Math Section. I know that based on the the admissions data that they are all about the 25th percentile or less for you guys…but based on this article I get the impression that these three numbers won’t necessarily make or break me; that if the rest of my application demonstrates me as a good applicant I will still have a good chance of being accepted. Am I correct in making this assumption?
    Thank you!

  12. Chris Peterson SM '13 says:

    Ryan –

    I think you’ve got the right idea.

  13. valart says:

    Chris,
    I hear it’s helpful to take the SAT at least twice given many schools take the highest score on each test. Do you think it’s worth taking it 3 times? Should a junior take it as early as December than again in May and once more senior year in October? Would it be better to skip the December test junior year?
    I think the MIT admissions process sounds like the fairest system out there. I like that it takes each individual’s grades, schooling, extracurricular activities, situation and personality into account. It seems the students are looked at as individuals and not a bunch of statistics.

  14. jcy036 says:

    I appreciate this post; it makes me more at ease with the whole admissions idea. Keep it up.

  15. Mark Dookharan says:

    Chris,
    I’m in the process of finishing up applications, and through the process, your posts are the most interesting of all to read. I feel a great deal of respect every time I read your posts and learn about MIT’s admission’s process – I love it!

  16. Nathan ('16?) says:

    Chris,
    I’ve been following your last few posts very closely, and I have to say that I now have a huge respect for you. You’ve done a great job researching your point and an excellent job presenting it and explaining it. I sincerely hope I can meet you someday (after I am admitted, naturally :D )

  17. Abderhman says:

    Hi Mr.Chris

    I want to know if I could send my Toefl score report by fax because sending one from ets is expensive for me ???

    Abderhman , Gaza strip

  18. Ruslan'16(hopeful) says:

    Past three years I spend my life failing on final scores of High schools, in 2009 I lost everything: My girlfriend, Chance to study in med school and my parents respect. However, I think they still didn’t give up on me. I knew about MIT but somethings put an obstacle on my great path. I decided to become an Electrical Engineer and devote my life to innovations. Year passed and I got the International Scholarship allowed to study abroad. It was my moment for life. Know it is time to proove another things… I am not domestic applicant, it means a lot of things, but as one famous person said: “The greater danger for most of us lies not in setting our aim too high and falling short; but in setting our aim too low, and achieving our mark.” (M.B.)
    never give up Guys,
    Thank you Chris

  19. Pete says:

    @Ruslan Very inspiring, and I wish you the best of luck whether or not you get in. It sounds like you’ll be able to make waves in this life either way.

  20. Deb says:

    the admission process is really wonderful!!!!!!

  21. Dear Chris,
    I changed my mind about which math SAT II to take, so the one I reported in the second part of the application as the one I would take will not match the score report the admissions office will receive. Is this a problem? And will my AP test scores in math be taken into account? Thanks for your help!