Happy Cinco De Mayo!!! by Ben Jones
Admissions counselors make the sexiest models. Oh, and a probability question.
Admissions counselors make the sexiest models. Check out these t-shirts, which were made by the good folks in LUChA. Awesome.
Math question: if Salvador and Amy each read 922 applications, and we received 10,500 applications this year, what is the probability that one of them read your application (assuming you were in the applicant pool)? Knowing that each app gets two reads, what is the probability that both of them read your application? Those are easy, right?
Okay, reading season lasts from early November through late February (for this problem we’ll say November 1 through February 28.) Salvador takes 5 days of vacation in December and 2 days in Janurary. Amy takes 3 days of vacation in November, 1 in December, 2 in January, and 1 in February. All else being equal – for applications for which Salvador and Amy are the two readers, what is the probability that Salvador read a given application before Amy?
First person to answer correctly gets a prize. Then again, how would I even know what the correct answer is? Hehe.
– – –
Edit: okay, first person to provide an answer (with documented process) that gets a stamp of approval from Keith gets the prize.
What we really need to know is, what’s the prize?
who is that salvador- he’s hot
ok i got the first two…
but im not tellin b4 anybody else does ..lol
they both look like stars.
does amy have a BF?
Well, for the first one:
1) I am going to assume the two piques are independent. I mean, if Amy chooses an application it doesn’t affect Salvador’s choice (although that might not be true but would render this unsolvable).
For each one:
P(Salvador reads it) = 922/10500
P(Amy reads it) = 922/10500
To calculate the probability that either of them will read it, it’s:
1 – P(Both haven’t read it)
= 1 – (9578/10500)^2 or 0.1679
The probability that both have read it is just
(922/10500)^2 or 0.007711
I’m trying to work on the second one now…
although it’s sort of weird.
HOLA!
It’s senior day as well as Cinco de Mayo. 05/05/05.
ADIOS!~Kiersten~
ps how about you ask a question about the brain, or rock n roll, then we’ll talk…
hi ben, when will you know about the waitlist?
Well done Mike! Full marks to Mike Axiak for problem 1.
I think that Ben should have to answer the second one.
Yea… the second one IS weirder than I thought…I tried playing with geometric distributions to no avail.
Oh well…maybe I’ll get it in a few days
Also, replace ‘piques’ with ‘picks’ . (typoes…)
Mike – you rock! Can’t wait to see what you come up with for the second part. No worries on the typo; I had one of my own in the word “January.”
Me – (well not me, but “me”) – middle of May is when they’ll start making decisions, I believe. I should know a bit earlier than that whether or not we’re even going to the waitlist… hopefully yes, since that was the intention, but it all depends on yield.
Keith – thanks for being the master of ceremonies! As for answering the second part… google can’t help me with this one!
Ben, I think answer number 2 is going to take a while. You’ve given us a hard task. You didn’t even tell us how often they get applications, and how many. Mike and I together are getting a bit stuck! We’ll figure something out eventually though.
Eh.. just been back from a engg. test and came to this blog to find a probability question
The second part (unless i’m dumb) is easier than the first. You see, either Salvador OR Amy will read an app first. So, probability that either of them reads a given app first is half of the total probability.
Hence, answer = [(922/10500)^2]/2 = 0.003855
Didn’t need the number of days they work or the number of apps read/day. [Correct me if I’m wrong.]
Oh, I was referring to Problem 2, not part(b) of problem one. The vacation days just don’t work out.
sorry- havent been here in a while.
Prashant, the prob isnt 50:50. The fact that amy took holidays before salvador slightly decreases her chances. ofcourse, once you put everything together salvador may be the one at a disadvantage (assuming reading an application first is something they want to do) but it isnt a direct 50%. Ive just seen the thing, so i’ll try it out for a couple of days and see if i can get anywhere with it. Right now i’m just delighted such questions dont come in any of our exams.
Jane, does it matter how many applications they get? I think it’ll be safe to assume that the probability of reading an application by a person is equally distributed among all the days they read the applications, and that they read an equal number of applications.
(if they dont, lets just holler sexual discrimination. hehehe) j/k
prashant, one more thing. your way, the answer is supposed to be 0.5.
“for applications for which Salvador and Amy are the two readersfor applications for which Salvador and Amy are the two readers”
thats given. think conditional probability.
or at least thats how i understand the question.
What is the probability of someone seeing my posts?
of course, if anyone answers this thatll make the probability one, ie, if i expect this to be answered then the answer to my question is known. But if i dont ask, i wont know the answer to my question. So am i answering my question by asking it?
hehe.
OK. i have the basics down. Its ridiculosly simple once you acknowledge the fact that it isnt easy for manual counting. Unfortunately, id have to move hell and high water to do it the conventional way. But wierd problems call for unconventional solutions, right?
It took over 2 hours for me to get out of the no microprocessor use frame of mind, but it looks good right now.
So, since this is MIT, i am assuming you will allow the use of a microprocessor. I dont have access to a C compiler for the next 20 hours or so, so the best estimate for a solution will be around noon GMT tomorrow.
Hmm Shashank, I thought about the holidays bit. But since the EXACT days on which the holidays occur isn’t given, you cannot assume either of them is at an advantage. Unless you know the exact distribution of holidays, you can very well assume them to be randomly distributed. The one thing I did overlook, though, was the fact that Salvador has holidays only in December and Feb. However, since both of them read apps for a total of 113 days, the apps/day is the same. Hence neither has a higher probability of reading a given app first.
Maybe
Ah no… I’ll think about it
ok ben. Fortunately i havent been able to start as yet – something came up. I think putting in the change shouldnt be too hard.
OK. Here goes.
As my PIV, 2.4GHz Processor earns its wings by running the program, i’ll try to explain what i’ve done (or am trying to do).
(im on another computer right now)
Consider the case where you know exactly when each person has taken his/her day off. Then, the probability is simply calculated by the following. For each day Salvador works, find the number of days AFTER that day on which Amy works. Add up the number obtained for each day of Salvador’s work and you have the total number of favourable cases. The probability of success is given by
number of cases*(probabilty that salvador reads on a specific day)* (probability that Amy reads on a specific day)
= number of cases * (1/113) * (1/113)
Now, in the problem, the specific days of vacation are unknowns, so you need to take into account every possible configuration of vacations.
What my program does(or atempts to do) is find, for all possible cases, the probabilty of Salvador reading the App first and then adding them up
Answer = [(Probability of each conf)*{(Probability of Salvador reading on a specific day) * (Probability of Amy reading the app on that specific day)}]
= [(1/total no of conf)*{(1/113) * (1/113)}]
So, once we have all possible cases,(a case here is where the days Salvador and Amy read the application are fixed, configuration is specified),
we have
Answer=(1/total no. of conf)*(1/113)*(1/113)*(Total number of cases).
Now. The program is written. It has about 16 nested for loops, the first 14 of which are to permute the vacation days. the last two are to find the number of cases for each configuration. My PIV has been running the program for over 10 minutes now. So i think you guys should use the faster computers at MIT to run this damn thing. I’ll post the C++ source code after either one of two things happen. (a) My computer finishes the job or (b) I get sick of it and terminate it prematurely. If there are any questions about my method, ask me. If anyone can think of a faster way, tell me.
Whew.
(nope. still waiting)
#include
main()
{
int PSpDay,VacDays[2][7],totalPoss,totalConf,PROBABILITY;
int i,a,j,k,l,m,o,p,q,r,s,t,u,v,w,x,y,z;
PSpDay=1/113;
totalPoss=0;
totalConf=0;
for (i=30;i<57;i++)
{
if(i%7==5) i++;
VacDays[0][0]=i;
for(j=i+1;j<58;j++)
{
if(j%7==5) j++;
VacDays[0][1]=j;
for(k=j+1;k<59;k++)
{
if(k%7==5) k++;
VacDays[0][2]=k;
for(l=k+1;l<60;l++)
{
if(l%7==5) l++;
VacDays[0][3]=l;
for(m=l+1;m<61;m++)
{
if(m%7==5) m++;
VacDays[0][4]=m;
for(o=61;o<91;o++)
{
if(o%7==5) o++;
VacDays[0][5]=o;
for(p=o+1;p<92;p++)
{
if(p%7==5) p++;
VacDays[0][6]=p;
for(q=0;q<28;q++)
{
if(q%7==6) q++;
VacDays[1][0]=q;
for(r=q+1;r<29;r++)
{
if(r%7==6) r++;
VacDays[1][1]=r;
for(s=r+1;s<30;s++)
{
if(s%7==6) s++;
VacDays[1][2]=s;
for(t=30;t<61;t++)
{
if(t%7==6) t++;
VacDays[1][3]=t;
for(u=61;u<91;u++)
{
if(u%7==6) u++;
VacDays[1][4]=u;
for(v=u+1;v<92;v++)
{
if(v%7==6) v++;
VacDays[1][5]=v;
for(w=92;w<120;w++)
{
if(w%7==6) w++;
VacDays[1][6]=w;
totalConf++;
cout<<endl<<(100*i/30)<<” percet complete”;
for(x=0;x<119;x++)
{
if(x%7==5) x++;
for(a=0;a<7;a++)
{
if (VacDays[0][a]==x) x++;
if (x%7==5) x++;
}
for(y=x+1;y<120;y++)
{
if(y%7==6) y++;
for(a=0;a<7;a++)
{
if(VacDays[1][a]==y) y++;
if(y%7==6) y++;
}
totalPoss++;
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
cout<<totalConf<<totalPoss;
PROBABILITY=(1/totalConf)*PSpDay*PSpDay*totalPoss;
cout<<endl<<PROBABILITY;
}
the above code is C++, and works on a turbo c++ compiler. If you’re using open source compilers such as GCC you need to modify it a bit. If you dont know how to do that let me know and i’ll try to make the necessary changes.
And it didnt finish running on my computer. I interrupted it and am about to restart it. Let me know if anyone runs it successfully.
OOOPS – ERROR IN CODE.
There is the line
cout<<endl<<(100*i/30)<<” percet complete”;
change that to
cout<<endl<<(100*(i-30)/30)<<” percent complete”;
Ben, maybe you can change it in the original post, perhaps?
Quick question – if there’s anyone here to answer it. Are the apps read on Saturdays and Sundays as well? I am assuming they arent to make things simpler. I am also assuming that the application cannot be read by both of them on the same day.
OK. There are more bugs. I dont know how it happened, but the first line is supposed to be #include
Ill try to see if theres more errors, although finding them will be tough
first line – #include (stdio.h)
change the brackets to the correct kind.
shit. not stdio.h
iostream.h
I guess id better go get some sleep
You guys are awesome! During reading season we work one weekend day. It can be either Saturday or Sunday. For the sake of this question, let’s say that Amy gets all of her work done on Saturdays, and Salvador procrastinates and does everything on Sundays.
First – CORRECTIONS and ENHANCEMENTS
a) Change variable types
lines
int PSpDay,VacDays[2][7],totalPoss,totalConf,PROBABILITY;
int i,a,j,k,l,m,o,p,q,r,s,t,u,v,w,x,y,z;
become
int VacDays[2][7];
long double PSpDay,totalPoss,totalConf,PROBABILITY;
int i,a,j,k,l,m,o,p,q,r,s,t,u,v,w,x,y,z;
b)remove the line
cout<<endl<<(100*(i-30)/30)<<” percent complete”;
It just takes too long. Of course, your screen will go blank and show no apparant activity for a long, long time.
c) change the last line
PROBABILITY=(1/totalConf)*PSpDay*PSpDay*totalPoss;
to
PROBABILITY=(PSpDay*PSpDay*totalPoss)/(totalConf);
Now that that is out of the way – to more important things. The method is not practical. I estimate it will take my computer 697.3 million years to reach a result. So until i figure out a more indirect approach, i am afraid i cannot solve the problem at hand.
best of luck to you guys. i have given up trying – atleast until monday. (i have an exam sunday, cant afford to screw it up) i’ll give it another shot later
I don’t think there’s anyway you can use DP here.
Hmm, I’m going to have to work on this over the weekend. Damn it Ben…and I was just getting back to lazy. Haha.
Later,
Mike.
Shashank- You are doing some great work!!!
Have you considered just finding the extreme values? By that I mean that since there will be a range of probabilities depending on which days of the month each person took their vacation days, why not calculate the solution when the first person takes all vacation days as early as possible in each month and the second person takes all vacation days as late as possible in each month. Then do it a second time with the first person as late as possible and the second person as early as possible. This is not as good as your attempt to find all possible solutions, but you will have found an interval that all solutions are included in.
Shashank, after briefly reading your comments and the question, I don’t believe that you are going to solve this proble using a “brutal force” algorithm. there are simply too many combinations… I am going to look at the problem this weekend, hope your computer won’t have it solved by then
I bought a great book — Introduction to Algorithms (MIT Press . It is slightly longer than my other comp books (it has over 1000 pages) but it covers so much cool stuff. some of the tools such as “dynamic programming” might be used here…
i’ve come up with another approach, which although has a relatively finite set of cases, but the calculations themselves are more complex and the statements more abstract. I can’t explain it here since it needs math symbols so on monday, once i have some free time, i’ll get down to putting it in words. As for the calculations themselves – i’m terrified at the prospect of coding the thing, and i’ll see to it only on monday.
About using only the extreme values, i think doing as you (keith) says is harder for me since it involves subjectively deciding the extreme cases, which i feel may not be the apparantly obvious cases with vacations at ends of months because the days of the vacation are specified for each month.
ok. dont have the answer yet,but so far i’ve come up with the algorithm. i’ve been really lazy the last week and havent done any further work on it (as in actually writing the program)
let E be the event of salvador reading the app before amy.
First, compartmentalize the problem into months – each representing the month in which salvador reads the application. From now on, whenevr i say something like P(E/Feb) i mean probability of the event E occuring given that Salvador reads the application in Feb.
let s1, s2, s3, s4 be the number of holidays salvador took in each of the months, a1, a2, a3, a4 be those for Amy.
wherever i needed mathematical structures i put them in <.>.
FEBRUARY (temporarily consider only feb for easier understanding)
let salvador read the application on the nth day of february.
P(E) = P(Sal reading on the nth day of Feb) * P(Amy reading after nth day of Feb)
P(E)=<summation of n over the 28 days of feb> P(Sal reading on the nth day)*[P(no. of days Amy works after n)*P(Amy readin on a specific day)]
no, for a single n,
let
P(Sal reading on the nth day) = A
and
[P(no. of days Amy works after n)*P(Amy readin on a specific day)]= B
Now, for A ::
A= 0 if n coresponds to a Saturday
<(28-1)C(s4)> / <(28)C(s4) * 1/113 if n does not correspond to a Saturday.
{prob that it isnt one of his holidays * prob of him reading it}
As for B ::
no of sundays in ANY given month after the nth day
of that month is given by
[ {(no of days in the month – n)- no of days between the last day of the month and the last sunday, both days inclusive}/7 ] +1,
where [.] correspond to the greatest integer function.
now the proof for this simple and i am not putting it here because its kindof hard to ecplain, but ask me if you dont understand it.
so, for february, no of sundays after the nth day is given by
nS= [{(28-n)-2}/7]+1 , (for the year 2005 only)
so, B is given by
<(n)C(a4)>/<(28)C(a4) * (1/120) * {(28-n)- nS – 0)
which is the probability that all of Amys holidays in the month are before n * prob that amy reads on a day after n, which in turn is no of days amy works/120.
now, cleary,
B= <summation over i=0 to a4><summation over n=1 to 28>
<(n)C(a4-i)>*<(28-n)C(i)>/<(28)C(a4)*(1/120)*{(28-n) – nS(<-this is a function of n)-i}
which is a much more realistic calculation than the previous method i gave.
JANUARY.
A= 0 if satday
<(31-1)C(s3)>/<(31)C(s3)> * (1/120) if not
B is the same as for feb, except change 28 to 31, refromulate nS, change all a4 to a3, and add (28-a4-4) to the part inside the brackets, ie,
B= <summation over i=0 to a3><summation over n=1 to 31>
<(n)C(a3-i)>*<(31-n)C(i)>/<(31)C(a3)*(1/120)*{(31-n) – nS(<-this is a function of n)-i+(28-4-a4)}
the same should be extrapolated to the other two months.
for each month, find A*B
then add up the 4 numbers which you get.
that be the answer.
Since i’m still in lazyland, i dont want to code the thing before i’m absolutely confident there are no stupid mistakes anywhere. So any comments will be helpful.
damn. it changed the things. i shouldve used different brackets. i didnt save it, so i’ll retype it later. Ben, would you have the text before the program reformatted the stuff in the brackets? if you do, could you repost the above after changin the <> to something else?
I tried out the extreme limit method and got these answers.
When Amy takes her hols at the beginning of each month and salvador at the end, the answer is **.518**
When Salvador takes all his hols at the beginning of the month and Amy at the end, the answer is **.486**
These are fairly precise, so Mostly the true answer must lie between these limits.
the answer should then lie in a range of width .032, which is about 3.2%. thats a lot of uncertainity…..
and i’m stll not very sure those are the extreme cases
Hey Shashank, I tried to fix it – did it work? Sorry about that – the program strips out html code…
thanks ben. you saved me a LOT of effort. thanks.
(it must have taken you some time to fix it though…)
I’ve (finally) started work on it.
Finished February so far.
The probability of Salvador reading the application in february, with Amy reading after salvador is
0.019506.
Still working on the other months…
Correction in February’s number – it is supposed to be 0.018658
January – 0.063617
January & February – 0.82275
0.082775
im posting the numbers again..
ok. i hope its okay
The Answer
Probability – 0.493839
February – 0.027552
January – 0.0938
December – 0.137165
November – 0.235323
I believe these to be accurate.
It is the result of the algorithm I posted earlier, with a few minor changes
Did you delete my post with the answer for some reason?
oh i am quite late!!this is very early post.
someone has already taken my prize!!