Assignment 2 – Statistical Methods
(MATH 1068) SP2 2021
Instructions
• This assignment is worth 20% of your final grade and is due at 5pm Friday 4th of June.
• Submission is online and through the Learnonline website only.
• Assignments will be marked and returned online.
• Minitab output will be required in some sections, to avoid losing any information when uploaded. The file submitted should be a PDF document.
• The marks for each question is displayed next to the question.
• It is important that you follow any instructions or guidance in the questions, such as “Use Minitab” where required.
• The assignment consists of a total of 75 marks.
• Any late submission will attract a penalty of 10% off the maximum marks available per day. The cut-off time is 5pm each day.
Question 1 (12 Marks)
Wave Data: The data collected is based on a comparison study between two different mooring methods to analyse the amount of electricity generated from using waves at sea. The difference between the two mooring methods is the effect of the bending stress in part of the device where Method 1 had cheaper components in the system. Both methods were applied to the same subjects and where the sample sea states are the same. The question of interest is whether the bending stress differs for the two mooring methods.
You will need the MINITAB worksheet called waves.MTW for this question, which you can download from the Data files folder within the course website.
For full marks, ensure that appropriate axis labels and meaningful titles are included with all your graphical displays for this question.
(a) (2 marks) Set up the null and alternative hypotheses for this problem.
(b) (2 marks) Use MINITAB to calculate the sample mean, sample standard deviation and test for Normality.
(c) (8 marks) Statistically test at a 10% level of significance whether the two mooring methods are significantly different. Include a conclusion in your answer. All calculations should be done without using MINITAB and a diagram included.
Question 2 (32 Marks)
Ore Data: In a study to investigate the concentrations of copper in ore, the following data in the dataset ore.MTW was collected. It represents three independent locations, the average size per mining site and the Copper Concentration (kgs). If there is a higher Copper Concentration, then this physically translates to the site being older and richer in minerals. (a) (8 marks) A mining company claims that for a site to be financially viable then the Cooper Concentration level at sites should be above 60. At the 5% level of significance, test whether the average copper concentration is above 60 kgs.
(b) (10 marks) Check the assumptions for ANOVA and test at the 5% significance level to determine whether there is a difference in Copper Concentration between the different regions. Include the post-hoc Tukey test to identify which pairs are different, if appropriate. Include your MINITAB output for full marks.
(c) (3 marks) By examining the patterns in the data, make a recommendation about which regions have higher Copper concentrations. Summarise the results of your analyses from parts (a) to (c) (3-4 sentences).
(d) (8 marks) By using the table of counts in the ore.MTW dataset, conduct an appropriate hypothesis test to determine whether the Site Area is related to the Region at a 5% level of significance.
(e) (3 marks) What conclusions can be drawn from the residuals in the table? State all the significant residuals in the answer.
Question 3 (31 Marks)
World Health Data: The dataset healthUS.MTW contains data from US Department of Health and Human Services, National Centre for health Statistics and 3rd National Health & Nutrition Examination Survey. The data contains numerical variables which are BMI (Kg/??!), Age (years), Height (cm), Weight (kg), Cholesterol (mg) and Waist (cm). It is important to analyse the data, make predications and draw conclusions on the association between the various variables.
You will need the MINITAB worksheet called healthUS.MTW for this question, which you can download from the Data files folder within the course website.
For full marks, ensure that appropriate axis labels and meaningful titles are included with all your graphical displays for this question.
(a) (4 marks) Use Minitab to compute the correlation coefficients for BMI (kg/??!) with each other variable. Discuss each case, providing the value for the correlation coefficient and an interpretation of the value. What is the best predicator variable to analyse BMI (kg/??!) and why? Why would you not choose Weight (kg) to predict BMI?
(b) (2 marks) Use Minitab to produce a scatterplot for BMI (kg/m^2) versus your best predicator variable from part (a). Describe the relationship observed from the display.
(c) (6 marks) Run a linear regression model in MINITAB. Are the requirements for a linear regression model satisfied? The requirements to verify are Linearity, Independence, Normality and Population standard deviations. For full marks attach your MINITAB output.
(d) (2 marks) What is the value of the intercept and is the value significant? Is the intercept meaningful?
(e) (3 marks) What is the value of the slope? What does the slope measure in this scenario?
(f) (3 marks) What is the value of the coefficient of determination? What precisely does it measure in this example?
(g) (6 marks) The regression model from the MINITAB output predicts the BMI given your choice of predictor variable. Calculate the average of your predicator variable and use
the regression model for this value to predict their BMI (kg/??!) on average? Is this prediction accurate? Give reasons.
(h) (5 marks) Based on your answer from part (g), use MINITAB to calculate the 95% prediction interval and the margin of error for the predicted value in (g). State the formula for the 95% prediction interval and interpret the meaning of the margin of error. Full marks attach your MINITAB output.