You have to use R studio to make this assignment. Look the questions to answer are in the SDM applied project and excel dataset can be found in the excel file attached. A screenshot and r studio needs to be attached to the relevant question.
The College
APPLIED PROJECT – TERM 3 2021
STUDENT NAME STUDENT ID SIGNATURE
UNIT NAME: Statistical Decision Making
UNIT NUMBER: MATH1029
NUMBER OF QUESTIONS: 4
VALUE OF QUESTIONS: Applied project questions are worth 30 marks in total.
ANSWERING QUESTIONS: This applied project is to be completed using RStudio. All handwritten answers are to be written on the space provided in this project.
LECTURER/UNIT COORDINATOR: Rubie Herrera/Hima Withana
Michael Casey
DUE DATE: Week 11: Friday, 21 January 2022, 11:59pm TOTAL PAGES: 8
INSTRUCTIONS TO STUDENTS
• The file containing the data sets for the following questions can be downloaded from the e-learning site vUWS.
• You should use RStudio to carry out all calculations and statistical analysis which must be shown in the file to be submitted on vUWS. You will be required to submit the R-Script used to perform all the calculations for all the questions. This R-Script should run without errors.
• All answers in this booklet must match your calculations done and submitted in the R-Script.
• To complete this project, you must provide the RStudio outputs for each of the questions.
• Individual projects submitted after the due date will attract a late penalty in accordance with the late penalty policy stated on the Western Sydney University – The College Website.
To upload your R-Script file follow these instructions:
a. Login to “vUWS”.
b. Select the Statistical Decision Making folder.
c. Under the Assessment tab (from the left hand side of the screen), select “Assessment 4:Applied Project (10%)” and select your group.
d. Use “Add attachment” to select the file to be submitted.
e. Double-check to make sure you have the correct file.
f. “Submit” your file.
g. A written hard copy of your applied project containing the
solutions to each question must be handed in by the due date of
this project.
Description of the assignment and data
The assignment centres around a study of drinking habits of young adults.
The data stored in the Microsoft Excel file named SDM AppliedProject 2021.3.xlsx is to be used for the assignment. It can be downloaded from vUWS.
You are to use the data set assigned to you by your teacher to answer all the questions in this assignment.
The data in this file were gathered from young adults aged between 18-30. The participants were surveyed about their drinking behaviour on the previous weeks, while drinking alone, and while drinking with a romantic partner.
The data set comprises of the following variables:
Gender Gender of the participants:
1=Female
2=Male
Age Age of the participants, in years
Alcohol (Drink Alone) Amount of alcohol consumed (per week) by the participants when drinking alone, in standard drinks
Alcohol (Drink with Partner) Amount of alcohol consumed (per week) by the participants when drinking with a romantic partner, in standard drinks
(a standard drink is any drink containing 10 grams of alcohol)
NOTE: The data was randomly created for the sole purpose of this assignment.
Question 1 (6 marks) Marks
In this question you are to investigate the distribution of the amount of alcohol consumption when drinking alone by using descriptive statistics, and a visual depiction of your results.
a) Using RStudio, obtain the following descriptive measures for alcohol consumption when drinking alone. * [2]
i) Mean
ii) Median
iii) Standard deviation
iv) Range
b) Using RStudio, construct a histogram with 5 classes* [2]
You must clearly label the axes and title the graph. [2]
c) Using RStudio, construct a 90% confidence interval for the population mean alcohol consumption when drinking alone*. [2]
* Evidence of work in RStudio is required
Question 2 (8 marks)
It is believed that to reduce the risk of harm from alcohol-related disease or injury, healthy men and women should drink no more than 6 standard drinks a week.
In this question you are to investigate if the average alcohol consumption when drinking alone is within the healthy limit, by carrying out a hypothesis test.
a) Using RStudio, obtain an output for a hypothesis test at a 5% level of significance to determine if the alcohol consumption when drinking alone is less than 6 standard drinks. (Assume that the amount of alcohol consumption when drinking alone are normally distributed). * [2]
b) Using the information in the output, conduct a hypothesis test at a 5% level of significance to determine if the alcohol consumption when drinking alone is less than 6 standard drinks:
i) State the null and alternative hypothesis [1]
ii) State the test statistic including the degrees of freedom [1]
iii) State the decision rule [1]
iv) State the p-value [1]
v) State the decision by giving reasons [1]
vi) Write down the conclusion [1]
* Evidence of work in RStudio is required
Question 3 (8 marks)
In this question you are to investigate if there is a significant difference between the alcohol consumption when drinking alone and when drinking with a partner, by carrying out a hypothesis test for the paired difference.
Marks
a) Using RStudio, obtain an output for a hypothesis test at a 10% level of significance. Assume that the paired differences are normally distributed. * [2]
b) Using the information in the output, conduct a hypothesis test at a 10% level of significance to determine if the alcohol consumption differs significantly when drinking alone and when drinking with a partner:
i) State the null and alternative hypothesis [1]
ii) State the test statistic including the degrees of freedom [1]
iii) State the decision rule [1]
iv) State the p-value [1]
v) State the decision by giving reasons [1]
vi) Write down the conclusion [1]
* Evidence of work in RStudio is required
Question 4 (8 marks) Marks
In this question you are to investigate if there is a correlation between the age and the alcohol consumption (when drinking alone), by performing a regression analysis.
a) Using RStudio, obtain a linear regression output. You must include both the graphical display and statistical analysis summary. * [2]
b) By examining the scatter plot discuss if there is a correlation between the age and the alcohol consumption (when drinking alone). [2]
c) State the coefficient of determination. [1]
d) Obtain the correlation coefficient. [1]
e) Discuss the correlation between the age and the alcohol consumption (when drinking alone) by referring to the numerical values obtained in part c) and part d). [2]
* Evidence of work in RStudio is required
END OF PROJECT