This assignment covers descriptive statistics and is worth a total of 30 points. Be sure that the work that you submit is in your own words. Attach your R code at the end as an appendix. Be sure that any plots you create are readable and labeled accordingly.
1. Which of the following sets of scores has the greatest variability (use range)? (I point)
a. 2, 5, 8, 11
b. 13, 13, 13, 13
c. 20, 25, 26, 27
d. 42, 43, 44, 45
2. Which of the following is not a measure of central tendency? (1 point)
a. Mode
b. Interquartile range
c. Mean
d. Median
3. Which measure of central tendency should be reported when an unusual data value (outlier) is present in the data set? (1 point)
a. Mode
b. Median
c. Mean
4. Which measure of variability should be reported when an unusual data value (outlier) is present in the data set? (1 point)
a. Range
b. Interquartile range
c. Standard deviation
d. Variance
5.
Health-conscious Americans often consult the nutritional information on food packages in an attempt to avoid foods with large amounts of fat, sodium, or cholesterol. The following information was taken from eight different brands of American cheese slices: (8 points)
Enter the data into Excel (if you like, you can only enter two variables: brand and calories for the questions). Load the Excel data into R and follow the R codes you learned from last week to answer the following questions.
a. Describe the distribution of Calories using an appropriate graph. Comment on the shape of the distribution, as well as the range. (4 points)
b. Calculate and interpret the sample mean, median value, and the sample standard deviation (s) for Calories. (4 points)
6. Using the same dataset from the previous week (practice1.csv), follow the R codes provided from this week’s PPT files (in the end of files) and complete the following tasks. (18 points).
a. Run frequencies for the following categorical (i.e., discrete) variables: Gender, smoking, and cancer (8 points). Answer the following questions:
What percentage of the sample is female? ____________
What percentage of the sample are current smokers? ____________
What percentage of the sample is most active? ____________
What percentage of the sample developed cancer? ____________
b. Run Descriptive for two continuous variables: Age, and BMI. (4 points)
Fill in the following table.
Variable |
Mean |
Median |
Standard Deviation |
Q25 |
Q75 |
Range |
Age |
|
|
|
|
|
|
BMI |
|
|
|
|
|
|
c. Paste your boxplot and histogram graphs for BMI. (6 points)