Problem 1: – work on this
Astra Zeneca needs to understand if there is a discrepancy between how females (“F”) react to their COVID booster and how males (“M”) do. They sample ten thousand people for whether or not they’ve had no reaction to the booster (“-” : negative reaction) or an obvious reaction (“+” : positive reaction). In column A are the genders of the 10000 people, and in column B their reaction. Determine in cell E1 the number of males, in cell E2 the number of females, in cell E4 the number of positive reactions, and in cell E5 the number of negative reactions. Determine in cell E7 the number of positive reactions among males and in cell E8 the number of positive reactions among females. Determine in cell E10 the proportion of positive reactions among males, and in cell E11 the proportion of positive reactions among females.
The operating hypothesis is that there is no difference between the proportion of positive reactions among males ( pm ) and females ( pf ). Construct a 90% confidence interval for the difference of true proportions ( pm– pf ), given sample, with the lower bound in E13 and upper bound in E14.
If 0 is within the confidence interval enter “no discernable difference” in E16. If 0 is less than the lower bound we take that pm– pf in positive and say “men react worse” in E16, and if 0 is greater than the upper bound we take that pm– pf in negative and say “women react worse” in E16.
Problem 2:
A chip manufacturer observes that chips seem to be failing in two different ways. The manufacturer is trying to determine whether or not the two failures are independent, but believes that they are independent. After testing a thousand chips, give in E1 the number failures at site 1, and the number in E2 the number of failures at site 2, and at E3 the number of failures at both sites.
If the two failures are independent, use the proportion of observed failures at site 1 and the proportion of observed failures at site 2 to give in E5 the number of chips expected to fail at both sites from a sample of size one thousand. We will compare the number of failures observed at both sites with the number expected.
Give the lower and upper bounds for a 90% confidence interval for the number of failures at both sites, assuming independence. If the observed number of failures at both sites is within this confidence interval display “YES” in E10, otherwise display “NO”
Problem 3:
Data is believed to be generated by an exponential random variable. A thousand data values are collected. Determine the average in C2 and compile the data into relative frequency bins (see E2:E5) in F2:F5. Using the average as mean, give in G2:G5 the proportions of data expected in each bin for an exponential process.
Since the values in column G are proportions, determine in H2:I5 the upper and lower bounds of 90% confidence intervals for an average over a sample of size 1000 for Bernoulli random variables of these proportions (i.e. in H2 the lower bound for a 90% confidence interval for an average over a sample of size 1000 of a Bernoulli (G2) random variable, etc.). Then indicate whether the relative frequencies are within these confidence intervals with “YES” or “NO” in J2:J5.
We accept that we do indeed have an exponential random variable if all four relative frequencies lie in their respective confidence intervals. Indicate whether we accept or reject the exponential random variable hypothesis with “ACCEPT” or “REJECT” in L2.
Illustrate the relationship between the bounds and the relative frequency by plotting columns F:I against column E in C7:J22 with line plots.