# Ben Jones▾

May 5, 2005

## Happy Cinco De Mayo!!!

Posted in: Miscellaneous

Admissions counselors make the sexiest models. Check out these t-shirts, which were made by the good folks in LUChA. Awesome.

Math question: if Salvador and Amy each read 922 applications, and we received 10,500 applications this year, what is the probability that one of them read your application (assuming you were in the applicant pool)? Knowing that each app gets two reads, what is the probability that both of them read your application? Those are easy, right?

Okay, reading season lasts from early November through late February (for this problem we'll say November 1 through February 28.) Salvador takes 5 days of vacation in December and 2 days in Janurary. Amy takes 3 days of vacation in November, 1 in December, 2 in January, and 1 in February. All else being equal - for applications for which Salvador and Amy are the two readers, what is the probability that Salvador read a given application before Amy?

First person to answer correctly gets a prize. Then again, how would I even know what the correct answer is? Hehe.

- - -

Edit: okay, first person to provide an answer (with documented process) that gets a stamp of approval from Keith gets the prize.

#### Comments (Closed after 30 days to reduce spam)

What we really need to know is, what's the prize?

Posted by: smoke coming from ears on May 5, 2005

who is that salvador- he's hot

Posted by: 0 on May 5, 2005

ok i got the first two...
but im not tellin b4 anybody else does ..lol

Posted by: Laila Shabir on May 5, 2005

they both look like stars.

Posted by: smr on May 5, 2005

does amy have a BF?

Posted by: 0 on May 5, 2005

Well, for the first one:
1) I am going to assume the two piques are independent. I mean, if Amy chooses an application it doesn't affect Salvador's choice (although that might not be true but would render this unsolvable).

For each one:
To calculate the probability that either of them will read it, it's:
1 - P(Both haven't read it)
= 1 - (9578/10500)^2 or 0.1679

The probability that both have read it is just
(922/10500)^2 or 0.007711

I'm trying to work on the second one now...
although it's sort of weird.

Posted by: Mike Axiak on May 5, 2005

HOLA!
It's senior day as well as Cinco de Mayo. 05/05/05.
ps how about you ask a question about the brain, or rock n roll, then we'll talk...

Posted by: Kiersten on May 5, 2005

hi ben, when will you know about the waitlist?

Posted by: me on May 5, 2005

Well done Mike! Full marks to Mike Axiak for problem 1.

I think that Ben should have to answer the second one.

Posted by: Keith W on May 5, 2005

Yea... the second one IS weirder than I thought...I tried playing with geometric distributions to no avail.
Oh well...maybe I'll get it in a few days
Also, replace 'piques' with 'picks' . (typoes...)

Posted by: Mike Axiak on May 5, 2005

Mike - you rock! Can't wait to see what you come up with for the second part. No worries on the typo; I had one of my own in the word "January."

Me - (well not me, but "me") - middle of May is when they'll start making decisions, I believe. I should know a bit earlier than that whether or not we're even going to the waitlist... hopefully yes, since that was the intention, but it all depends on yield.

Keith - thanks for being the master of ceremonies! As for answering the second part... google can't help me with this one!

Posted by: Ben on May 5, 2005

Ben, I think answer number 2 is going to take a while. You've given us a hard task. You didn't even tell us how often they get applications, and how many. Mike and I together are getting a bit stuck! We'll figure something out eventually though.

Posted by: Jane W on May 5, 2005

Eh.. just been back from a engg. test and came to this blog to find a probability question

The second part (unless i'm dumb) is easier than the first. You see, either Salvador OR Amy will read an app first. So, probability that either of them reads a given app first is half of the total probability.

Hence, answer = [(922/10500)^2]/2 = 0.003855

Didn't need the number of days they work or the number of apps read/day. [Correct me if I'm wrong.]

Posted by: Prashant on May 8, 2005

Oh, I was referring to Problem 2, not part(b) of problem one. The vacation days just don't work out.

Posted by: Jane W on May 8, 2005

sorry- havent been here in a while.

Prashant, the prob isnt 50:50. The fact that amy took holidays before salvador slightly decreases her chances. ofcourse, once you put everything together salvador may be the one at a disadvantage (assuming reading an application first is something they want to do) but it isnt a direct 50%. Ive just seen the thing, so i'll try it out for a couple of days and see if i can get anywhere with it. Right now i'm just delighted such questions dont come in any of our exams.

Posted by: Shashank on May 15, 2005

Jane, does it matter how many applications they get? I think it'll be safe to assume that the probability of reading an application by a person is equally distributed among all the days they read the applications, and that they read an equal number of applications.

(if they dont, lets just holler sexual discrimination. hehehe) j/k

Posted by: Shashank on May 15, 2005

prashant, one more thing. your way, the answer is supposed to be 0.5.

thats given. think conditional probability.

or at least thats how i understand the question.

Posted by: Shashank on May 15, 2005

What is the probability of someone seeing my posts?

of course, if anyone answers this thatll make the probability one, ie, if i expect this to be answered then the answer to my question is known. But if i dont ask, i wont know the answer to my question. So am i answering my question by asking it?

hehe.

Posted by: Shashank on May 15, 2005

Hmm Shashank, I thought about the holidays bit. But since the EXACT days on which the holidays occur isn't given, you cannot assume either of them is at an advantage. Unless you know the exact distribution of holidays, you can very well assume them to be randomly distributed. The one thing I did overlook, though, was the fact that Salvador has holidays only in December and Feb. However, since both of them read apps for a total of 113 days, the apps/day is the same. Hence neither has a higher probability of reading a given app first.

Maybe

Posted by: Prashant on May 16, 2005

Ah no... I'll think about it

Posted by: Prashant on May 16, 2005

OK. i have the basics down. Its ridiculosly simple once you acknowledge the fact that it isnt easy for manual counting. Unfortunately, id have to move hell and high water to do it the conventional way. But wierd problems call for unconventional solutions, right?
It took over 2 hours for me to get out of the no microprocessor use frame of mind, but it looks good right now.

So, since this is MIT, i am assuming you will allow the use of a microprocessor. I dont have access to a C compiler for the next 20 hours or so, so the best estimate for a solution will be around noon GMT tomorrow.

Posted by: Shashank on May 16, 2005

Quick question - if there's anyone here to answer it. Are the apps read on Saturdays and Sundays as well? I am assuming they arent to make things simpler. I am also assuming that the application cannot be read by both of them on the same day.

Posted by: Shashank on May 17, 2005

You guys are awesome! During reading season we work one weekend day. It can be either Saturday or Sunday. For the sake of this question, let's say that Amy gets all of her work done on Saturdays, and Salvador procrastinates and does everything on Sundays.

Posted by: Ben on May 17, 2005

ok ben. Fortunately i havent been able to start as yet - something came up. I think putting in the change shouldnt be too hard.

Posted by: Shashank on May 17, 2005

OK. Here goes.

As my PIV, 2.4GHz Processor earns its wings by running the program, i'll try to explain what i've done (or am trying to do).

(im on another computer right now)

Consider the case where you know exactly when each person has taken his/her day off. Then, the probability is simply calculated by the following. For each day Salvador works, find the number of days AFTER that day on which Amy works. Add up the number obtained for each day of Salvador's work and you have the total number of favourable cases. The probability of success is given by

number of cases*(probabilty that salvador reads on a specific day)* (probability that Amy reads on a specific day)

= number of cases * (1/113) * (1/113)

Now, in the problem, the specific days of vacation are unknowns, so you need to take into account every possible configuration of vacations.

What my program does(or atempts to do) is find, for all possible cases, the probabilty of Salvador reading the App first and then adding them up

Answer = [(Probability of each conf)*{(Probability of Salvador reading on a specific day) * (Probability of Amy reading the app on that specific day)}]
= [(1/total no of conf)*{(1/113) * (1/113)}]

So, once we have all possible cases,(a case here is where the days Salvador and Amy read the application are fixed, configuration is specified),

we have

Answer=(1/total no. of conf)*(1/113)*(1/113)*(Total number of cases).

Now. The program is written. It has about 16 nested for loops, the first 14 of which are to permute the vacation days. the last two are to find the number of cases for each configuration. My PIV has been running the program for over 10 minutes now. So i think you guys should use the faster computers at MIT to run this damn thing. I'll post the C++ source code after either one of two things happen. (a) My computer finishes the job or (b) I get sick of it and terminate it prematurely. If there are any questions about my method, ask me. If anyone can think of a faster way, tell me.

Whew.

(nope. still waiting)

Posted by: Shashank on May 17, 2005

#include
main()
{
int PSpDay,VacDays[2][7],totalPoss,totalConf,PROBABILITY;
int i,a,j,k,l,m,o,p,q,r,s,t,u,v,w,x,y,z;
PSpDay=1/113;
totalPoss=0;
totalConf=0;
for (i=30;i<57;i++)
{
if(i%7==5) i++;
VacDays[0][0]=i;
for(j=i+1;j<58;j++)
{
if(j%7==5) j++;
VacDays[0][1]=j;
for(k=j+1;k<59;k++)
{
if(k%7==5) k++;
VacDays[0][2]=k;
for(l=k+1;l<60;l++)
{
if(l%7==5) l++;
VacDays[0][3]=l;
for(m=l+1;m<61;m++)
{
if(m%7==5) m++;
VacDays[0][4]=m;
for(o=61;o<91;o++)
{
if(o%7==5) o++;
VacDays[0][5]=o;
for(p=o+1;p<92;p++)
{
if(p%7==5) p++;
VacDays[0][6]=p;
for(q=0;q<28;q++)
{
if(q%7==6) q++;
VacDays[1][0]=q;
for(r=q+1;r<29;r++)
{
if(r%7==6) r++;
VacDays[1][1]=r;
for(s=r+1;s<30;s++)
{
if(s%7==6) s++;
VacDays[1][2]=s;
for(t=30;t<61;t++)
{
if(t%7==6) t++;
VacDays[1][3]=t;
for(u=61;u<91;u++)
{
if(u%7==6) u++;
VacDays[1][4]=u;
for(v=u+1;v<92;v++)
{
if(v%7==6) v++;
VacDays[1][5]=v;
for(w=92;w<120;w++)
{
if(w%7==6) w++;
VacDays[1][6]=w;
totalConf++;
cout<<endl<<(100*i/30)<<" percet complete";
for(x=0;x<119;x++)
{
if(x%7==5) x++;
for(a=0;a<7;a++)
{
if (VacDays[0][a]==x) x++;
if (x%7==5) x++;
}
for(y=x+1;y<120;y++)
{
if(y%7==6) y++;
for(a=0;a<7;a++)
{
if(VacDays[1][a]==y) y++;
if(y%7==6) y++;
}
totalPoss++;
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
cout<<totalConf<<totalPoss;
PROBABILITY=(1/totalConf)*PSpDay*PSpDay*totalPoss;
cout<<endl<<PROBABILITY;

}

Posted by: Shashank on May 17, 2005

the above code is C++, and works on a turbo c++ compiler. If you're using open source compilers such as GCC you need to modify it a bit. If you dont know how to do that let me know and i'll try to make the necessary changes.

And it didnt finish running on my computer. I interrupted it and am about to restart it. Let me know if anyone runs it successfully.

Posted by: Shashank on May 17, 2005

OOOPS - ERROR IN CODE.

There is the line

cout<<endl<<(100*i/30)<<" percet complete";

change that to

cout<<endl<<(100*(i-30)/30)<<" percent complete";

Ben, maybe you can change it in the original post, perhaps?

Posted by: Shashank on May 17, 2005

OK. There are more bugs. I dont know how it happened, but the first line is supposed to be #include

Ill try to see if theres more errors, although finding them will be tough

Posted by: Shashank on May 17, 2005

first line - #include (stdio.h)

change the brackets to the correct kind.

Posted by: Shashank on May 17, 2005

shit. not stdio.h

iostream.h

I guess id better go get some sleep

Posted by: Shashank on May 17, 2005

First - CORRECTIONS and ENHANCEMENTS

a) Change variable types

lines
int PSpDay,VacDays[2][7],totalPoss,totalConf,PROBABILITY;
int i,a,j,k,l,m,o,p,q,r,s,t,u,v,w,x,y,z;
become
int VacDays[2][7];
long double PSpDay,totalPoss,totalConf,PROBABILITY;
int i,a,j,k,l,m,o,p,q,r,s,t,u,v,w,x,y,z;

b)remove the line
cout<<endl<<(100*(i-30)/30)<<" percent complete";
It just takes too long. Of course, your screen will go blank and show no apparant activity for a long, long time.

c) change the last line

PROBABILITY=(1/totalConf)*PSpDay*PSpDay*totalPoss;

to
PROBABILITY=(PSpDay*PSpDay*totalPoss)/(totalConf);

Now that that is out of the way - to more important things. The method is not practical. I estimate it will take my computer 697.3 million years to reach a result. So until i figure out a more indirect approach, i am afraid i cannot solve the problem at hand.

Posted by: Shashank on May 18, 2005

Shashank, after briefly reading your comments and the question, I don't believe that you are going to solve this proble using a "brutal force" algorithm. there are simply too many combinations... I am going to look at the problem this weekend, hope your computer won't have it solved by then
I bought a great book -- Introduction to Algorithms (MIT Press . It is slightly longer than my other comp books (it has over 1000 pages) but it covers so much cool stuff. some of the tools such as "dynamic programming" might be used here...

Posted by: Simon on May 19, 2005

I don't think there's anyway you can use DP here.

Hmm, I'm going to have to work on this over the weekend. Damn it Ben...and I was just getting back to lazy. Haha.

Later,
Mike.

Posted by: Michael Borohovski on May 20, 2005

best of luck to you guys. i have given up trying - atleast until monday. (i have an exam sunday, cant afford to screw it up) i'll give it another shot later

Posted by: Shashank on May 20, 2005

Shashank- You are doing some great work!!!

Have you considered just finding the extreme values? By that I mean that since there will be a range of probabilities depending on which days of the month each person took their vacation days, why not calculate the solution when the first person takes all vacation days as early as possible in each month and the second person takes all vacation days as late as possible in each month. Then do it a second time with the first person as late as possible and the second person as early as possible. This is not as good as your attempt to find all possible solutions, but you will have found an interval that all solutions are included in.

Posted by: Keith W on May 20, 2005

i've come up with another approach, which although has a relatively finite set of cases, but the calculations themselves are more complex and the statements more abstract. I can't explain it here since it needs math symbols so on monday, once i have some free time, i'll get down to putting it in words. As for the calculations themselves - i'm terrified at the prospect of coding the thing, and i'll see to it only on monday.

About using only the extreme values, i think doing as you (keith) says is harder for me since it involves subjectively deciding the extreme cases, which i feel may not be the apparantly obvious cases with vacations at ends of months because the days of the vacation are specified for each month.

Posted by: Shashank on May 21, 2005

ok. dont have the answer yet,but so far i've come up with the algorithm. i've been really lazy the last week and havent done any further work on it (as in actually writing the program)

let E be the event of salvador reading the app before amy.

First, compartmentalize the problem into months - each representing the month in which salvador reads the application. From now on, whenevr i say something like P(E/Feb) i mean probability of the event E occuring given that Salvador reads the application in Feb.

let s1, s2, s3, s4 be the number of holidays salvador took in each of the months, a1, a2, a3, a4 be those for Amy.

wherever i needed mathematical structures i put them in <.>.

FEBRUARY (temporarily consider only feb for easier understanding)

P(E) = P(Sal reading on the nth day of Feb) * P(Amy reading after nth day of Feb)

P(E)=<summation of n over the 28 days of feb> P(Sal reading on the nth day)*[P(no. of days Amy works after n)*P(Amy readin on a specific day)]

no, for a single n,
let

P(Sal reading on the nth day) = A
and
[P(no. of days Amy works after n)*P(Amy readin on a specific day)]= B

Now, for A ::

A= 0 if n coresponds to a Saturday
<(28-1)C(s4)> / <(28)C(s4) * 1/113 if n does not correspond to a Saturday.

{prob that it isnt one of his holidays * prob of him reading it}

As for B ::

no of sundays in ANY given month after the nth day
of that month is given by

[ {(no of days in the month - n)- no of days between the last day of the month and the last sunday, both days inclusive}/7 ] +1,

where [.] correspond to the greatest integer function.

now the proof for this simple and i am not putting it here because its kindof hard to ecplain, but ask me if you dont understand it.

so, for february, no of sundays after the nth day is given by

nS= [{(28-n)-2}/7]+1 , (for the year 2005 only)

so, B is given by

<(n)C(a4)>/<(28)C(a4) * (1/120) * {(28-n)- nS - 0)

which is the probability that all of Amys holidays in the month are before n * prob that amy reads on a day after n, which in turn is no of days amy works/120.

now, cleary,

B= <summation over i=0 to a4><summation over n=1 to 28>

<(n)C(a4-i)>*<(28-n)C(i)>/<(28)C(a4)*(1/120)*{(28-n) - nS(<-this is a function of n)-i}

which is a much more realistic calculation than the previous method i gave.

JANUARY.

A= 0 if satday
<(31-1)C(s3)>/<(31)C(s3)> * (1/120) if not

B is the same as for feb, except change 28 to 31, refromulate nS, change all a4 to a3, and add (28-a4-4) to the part inside the brackets, ie,

B= <summation over i=0 to a3><summation over n=1 to 31>

<(n)C(a3-i)>*<(31-n)C(i)>/<(31)C(a3)*(1/120)*{(31-n) - nS(<-this is a function of n)-i+(28-4-a4)}

the same should be extrapolated to the other two months.

for each month, find A*B

then add up the 4 numbers which you get.

Since i'm still in lazyland, i dont want to code the thing before i'm absolutely confident there are no stupid mistakes anywhere. So any comments will be helpful.

Posted by: Shashank on May 29, 2005

damn. it changed the things. i shouldve used different brackets. i didnt save it, so i'll retype it later. Ben, would you have the text before the program reformatted the stuff in the brackets? if you do, could you repost the above after changin the <> to something else?

Posted by: Shashank on May 29, 2005

I tried out the extreme limit method and got these answers.

When Amy takes her hols at the beginning of each month and salvador at the end, the answer is **.518**

When Salvador takes all his hols at the beginning of the month and Amy at the end, the answer is **.486**

These are fairly precise, so Mostly the true answer must lie between these limits.

Posted by: Shirish on May 31, 2005

the answer should then lie in a range of width .032, which is about 3.2%. thats a lot of uncertainity.....

and i'm stll not very sure those are the extreme cases

Posted by: Shashank on May 31, 2005

Hey Shashank, I tried to fix it - did it work? Sorry about that - the program strips out html code...

Posted by: Ben on June 4, 2005

thanks ben. you saved me a LOT of effort. thanks.

(it must have taken you some time to fix it though...)

Posted by: Shashank on June 9, 2005

I've (finally) started work on it.

Finished February so far.

0.019506.

Still working on the other months...

Posted by: Shashank on July 3, 2005

Correction in February's number - it is supposed to be 0.018658
January - 0.063617

January & February - 0.82275

Posted by: Shashank on July 4, 2005

0.082775

Posted by: Shashank on July 4, 2005

Did you delete my post with the answer for some reason?

Posted by: Shashank on July 6, 2005

im posting the numbers again..

ok. i hope its okay

Probability - 0.493839

February - 0.027552

January - 0.0938

December - 0.137165

November - 0.235323

I believe these to be accurate.

It is the result of the algorithm I posted earlier, with a few minor changes

Posted by: Shashank on July 6, 2005

oh i am quite late!!this is very early post.

someone has already taken my prize!!

Posted by: vivek on September 17, 2005