Hello everyone in this section we will construct a hypothesis test for the difference between two proportions of two different population so we have a population here we may call it x and another population y and we want to see if there is a difference between the proportion of x and y probability of success or proportion of defected .
Product and so forth if we define the proportion as x which is the number of success divided by n so i take a sample here of size nx and they take a sample here of size and y and they define the proportion of success for the random variable y as y divided by n1 where x here is the number of success in the sample and y .
Here is the number of success in this sample so from the center and limit theorem if the size of the sample nx time the probability of success px is larger than 10 then these random variable proportion is approximated by a normal distribution with the mean p hat and a variance p hat time 1 minus p .
Over n that's according to the central limit theorem and if i construct a new random variable which is the difference between let's say px minus py then this also would be a random variable of normal distribution with mean equal p x minus p y and a variance of the variance of random variable x and y so that is p x 1 minus .
P x divided by n x plus p y 1 minus p y divided by n y so these we have studied earlier the only difference now in the hypothesis test if i set my null hypothesis h0 such that there is no difference between the two proportion then i am assuming in the null hypothesis that the proportion of the population x is the same as the .
Proportion of the population y and in this case then what will be the variance of this random variable if i'm assuming this proportion the same so am i going to use this variance this variance is correct if no constraint on px minus py but if i am constrain them to be equal then should i use for the variance this proportion or should i use .
This proportion since i'm assuming they are the same in this case we use the pole proportion and the pooled proportion is just the number of success in this sample x plus the number of success in this sample y divided by the size of this sample and x plus n y so when we calculate the variance with this null hypothesis then we would use this .
Proportion so instead of p x i will put p hat instead of p y here i'll put p hat with that said now we can construct the hypothesis test procedure as follow so that's what we just said now p x p y if the null hypothesis assumed these are the same then the mean would .
Be zero and the variance instead of this is p x p x divided by n x and p y p y divided by n y now i use the pole so instead of having here p x time 1 minus p x and here i have p y time 1 minus p y now i'm gonna have these replace px by the p at the pole and take .
It out here and here so i get this time these time these and this will be the normal distribution of these proportion assuming they are the same because that's the null hypothesis we would be using so with that said this is the hypothesis test let me zoom this in let x be a binomial so i have a two .
Population and here a manufacturer so it's either you create a product success or failure so they have a binomial and if i take a sample here and they take a sample here i calculated p x hat p y hat and here is n x and here is n y and according to the central limit theorem if you have at least 10 failure and 10 .
Success in the sample then i can use the central limit theorem and this p will have a normal distribution so now we compute p x as number of success in the sample divided by nx and y divided by n y and this is the pole for the proportion so now when we calculate the z score will be these p x hat minus p y minus zero so we don't include zero .
Because as before for the mean delta zero and delta zero could be any number here we are assuming they are no difference between them so that would be zero so we don't need to write it here divided by the standard deviation of bx minus py and this will be using the pole proportion and how we said the null hypothesis and the alternative .
Hypothesis we use the same rules we used earlier we calculate px minus py and this value it could be either 0 a positive number or negative number if px minus py is positive number so here is zero if it's a positive number it .
Will be here so then the alternative the hypothesis would be this one and the p value would be to the right of the z score and that's the p value and the null hypothesis will be the opposite of h1 px minus py less than zero always oppose the evidence of the sample .
And the same thing if px minus py less than zero then the area to the left if z came here then the p value will be here if they are not equal zero that means the null hypothesis will be for this one equals zero so now you calculate the two tail so this is zero if z came here or here you .
Calculate these two tail for the p-value so let's take examples to find the difference between two proportion and reach some conclusion a mobile computer network consists of computers that maintain wireless communication with one another as they move about a given area a routine protocol is an algorithm that determines .
How messages will be relayed from machine to machine along the network so as to have the greatest chance of reaching their destination assume that using a protocol a so now we have two protocol we want to compare them which one give a higher success rate of transmitting the data so now we have a protocol a .
So let's call these population x and this population y protocol b assume that using protocol a 200 message were sent and 170 of them were successfully received so nx 200 and number of success x equal 170. using .
Protocol b 150 messages were sent and 123 of them were successfully received so y is 120. so here we estimate the proportion of x equal 170 divided by 200 and this should give us 0.85 and estimate this y proportion it will .
Be 123 divided by 150 and it should give us 0.82 so before we decide what is the null hypothesis let's calculate the pool proportion which is x plus y so that's 170 plus y 123 divided by n x plus n y so that is 200 plus 150 and that pooled proportion is 0.837 now i have everything i need to .
Calculate the z-score but before we calculate the z-score we have to set our null hypothesis and how we're going to set our null hypothesis let's just see what is the sample what kind of evidence does it give us so let's calculate px minus py so px minus py is 0.85 .
Minus 0.82 so this is 0.03 so it is somewhere here this is zero so it's just somewhere here so the evidence shows that p x larger than p y so i will go against the evidence so the null hypothesis will be probability of p x minus probability of y or the proportion is equal 0 or even .
Less and the alternative hypothesis h1 will be the opposite probability of x minus probability of y larger than 0. so here the null hypothesis no difference between protocol a and b and even though the sample shows some evidence that protocol a is better than b but it's insignificant so i will assume they are the same or maybe even less so now we .
Have to calculate this z-score to find this p-value and if that p-value came very very small then i will have enough evidence to reject the null hypothesis if that p-value came larger than five percent ten percent then i will not be able to reject the null hypothesis so to calculate the z-score it will be p x minus p y so that is 0.85 .
Minus 0.82 divided by the square root of p this is the pole p here hat so that's p hat which is 0.837 time 1 minus 0.837 time 1 over nx 1 over 200 plus 1 over n y 150 so these equal 0.03 divided by 0.03989 and these came 0.752 and they go to the z table .
Standard normal distribution table and i look at z equal 0.752 which is somewhere here and from that table that p-value or alpha it will come 0.226 so p-value equal 0.226 which equal 22.6 so there is no strong evidence against .
The null hypothesis so i cannot reject the null hypothesis and i cannot conclude protocol b has the greater success rate okay let's look at the next example here the article factors associated with exercise behavior in people with perkins disease reported a study of patients with parkinson's disease of .
164 patients who said they exercised regularly 76 reported falling in the previous six months so let's create our population our first population here i will call it x patient exercise and this population y patient don't exercise and they said of 164 patients who said they exercised regularly and x equal 164 76 reported falling so x equal .
76 in previous six months of 96 patients who said they don't exercise regularly so ny equal 96 48 reported falling in the previous six months so y equal 48 can you conclude that the proportion of patients who fall is less for those who exercise than for those who don't exercise so let's first calculate the proportion of failing for the patient to .
Exercise 76 divided by 164 and this should come 0.463 and the proportion of patient falling who don't exercise is 48 divided by 96 and this should come 0.5 before we make the null hypothesis let's just calculate the pole proportion so this is x plus y .
Patient falling from two population divided by the total sample and x 164 plus n y 96 so this is 0.4769 okay so now we have to set the null hypothesis and to set the null hypothesis let's see what the sample says the sample says probability of x minus probability of y or the proportion is 0.463 .
Minus 0.5 so that is minus 0.037 so if this is 0 it is somewhere here okay so the evidence shows it's less than zero so automatically my null hypothesis would be the proportion of x minus the proportion of y equals zero or even more that means exercising make no difference there the proportion of falling with patient exercising or don't .
Exercise is just the same even though the sample showed patient with exercise fall less the proportion is 0.463 versus 0.5 even though the small difference but my null hypothesis no difference maybe even more so the alternative hypothesis then h1 will be probability of x minus probability of y or proportion less than zero so now i have to calculate the .
Z-score is proportion of x 0.463 minus proportion of y minus 0.5 divided by the standard deviation of px minus py so that is the pole proportion 0.4769 time 1 minus 0.4769 time 1 over nx is 1 over 164 plus 1 over ny which is 96 and if you calculate these you get minus 0.037 divided by .
0.064179 so the z-score equal minus 0.5765 and if i use the standard normal distribution table minus 0.5 is here and the p-value will be 0.2843 so the p-value is 0.2843 that is four 28.43 percent so there is no strong .
Evidence to reject the null hypothesis and micro inclusion is there is no difference in the proportion of falling if the patient with parkinson's disease exercise or don't exercise the proportion of falling is the same so exercise has no factor in balancing the patient with parkinson's disease from this study okay these conclude section .
6.6 our next topic will be section 6.7 which is building a test hypothesis for the difference between two means when we have only small sample thank you
This video explains the hypothesis test for the difference between two proportions of two populations from large samples. It explains how to set the null hypothesis and calculate the statistic z-score and the p-value so you can reject the null hypothesis or not. Several practical examples are solved to demonstrate the concepts.