Final project Stat 454 Fall 2022
Guidelines:
This project will require programming in SAS
You are expected to work independently on this project
Your final project should be a typed report
You may include graphs in the body of the report, but no other SAS output. You may append relevant SAS output at the end of the report.
You should include a .txt file with your complete SAS code (including data input)
Objective:
The goal is to implement the techniques you learned in this course to analyze a complex data set in several steps. While the general analysis approaches will be specified, it’s up to you to verify assumptions and come to conclusions.
The data:
Variable name
Information
Species
Species of animal
Body_weight
Body weight in kg
Brain_weight
Brain weight in g
Nondream_sleep
Nondreaming sleep (hours/day)
Dream_sleep
Dreaming sleep (hours/day)
Total_sleep
Total sleep (hours/day)
Life_span
Maximum life span (years)
Gestation
Gestation time (days)
Predation
Predation index (1=least likely to be prey, 5=most likely to be prey)
Sleep_exposure
Sleep exposure index (1=least exposed, 5=most exposed)
Danger_index
Danger index (1=least danger from other animals, 5=most danger from other animals)
The purpose of model-building:
The main goal is to explain and predict the amount of sleep required (total sleep and dreaming sleep) based on general characteristics of the mammal. The data set includes information on 62 different mammals.
Final project Stat 454 Fall 2022
Create summary statistics and at least one descriptive graphic for each variable (besides species name).
Examine the correlations between the variables. Which pairs of variables are highly correlated?
Develop a multiple regression model to explain the total amount of sleep per day.
Look at the correlation between total sleep and other variables you might include in the model (besides species name). Look at scatterplots. Would any variables benefit from a transformation? If so, prepare those variables to be added to the model.
Model 1A: Include all reasonable variables in your model. If you left any variables out, explain why. Does this model significantly explain the total amount of sleep required for mammals per day? Support your answer.
Model 1B: Are there any problems with multicollinearity in your model? If so, manage the problematic variable(s) and re-run your model. If not, skip to the next question. Either way, justify your answer.
Model 1C: Use a model selection procedure to choose among the variables to include in your model. Which selection procedure did you choose and what criteria did you use? Does this model significantly explain the total amount of sleep per day? Support your answer. Which coefficients (if any) are statistically significant predictors of total sleep?
Model 1D: repeat the previous question but use a different model selection method & a
different criterion. (Note: even with different criteria, it is possible that you will get identical answers for models 1C and 1D).
Create a table that summarizes each model 1A, 1B, 1C, and 1D. Include the variables in each model, their estimated coefficients, p-values for each coefficient, and at least one measure of model fit.
Choose the “best” model out of 1A, 1B, 1C, and 1D. Why is that model the best one? Support your answer.
Check the assumptions for the best model and comment on its validity. If any assumptions are violated, explain how you know this and what steps you might take to remedy the situation.
i. Write the regression for the best model in the form
ˆ
+b1 x1 +
y = b0
for coefficients plugged in. Interpret each coefficient.
with numerical estimates
Repeat question 3, but use the number of hours of dreaming sleep per night as the response variable.
Write a paragraph summarizing your findings from questions 3 and 4.
Extra questions: use a statistical method to explore the validity of the following statements using the data set provided. Explain your approach and your conclusion. There are multiple correct approaches to explore these problems.
Animals with longer gestation periods are smarter, in general.
Smaller animals are more likely to be prey than larger animals.
Larger animals tend to sleep out in the open more than smaller animals.
Animals that sleep more per day tend to live longer, in general.
Predatory animals dream more than prey animals.
Smarter animals live longer than other animals.
Final project Stat 454 Fall 2022
Point allocation:
The entire project is worth 100 points total. You may earn up to 110 points with bonus efforts. Problem 1: 10 points
Choose appropriate summary statistics and graphs. Make sure the graphs and table(s) are clear and easy to read.
Problem 2: 8 points
Display all appropriate pairwise correlations in a table. Note the variables with strong relationships.
Note that you must create the table; do not copy/paste SAS tables in your report.
Problem 3: 30 points
Problem 4: 30 points
Correlations/scatterplots and appropriate transformations: 3 points
Model 1A and commentary: 2 points
Model 1B, exploration of multicollinearity: 3 points
Model 1C, model selection: 2 points
Model 1D, model selection: 2 points
Table of all models: 5 points
“Best” model & justification: 2 points
Model assumptions/diagnostics: 6 points
You may earn up to 3 bonus points for “fixing” a model that violates assumptions. Make sure to explain your approach and conclusion. If you choose to do this, use the improved model to answer part (i) below.
Regression equation & interpretation of coefficients: 5 points
Problem 5: 10 points
Write a cogent, clear paragraph or two explaining which variables (if any) have an impact on total sleep and on dreaming sleep, how you know, and what the impact is.
Problem 6: 12 points
Pick four relationships out of a-f to explore. For each relationship, explain the statistical method you chose to explore it, why you chose that, and the results. Do not use the same statistical method for each relationship. You may support your conclusions with graphs, tables, and statistics. Make sure to mention how you know your approach is a valid one (in particular mention relevant assumptions). You may earn up to 3 points per relationship.
You may earn bonus points on this problem in two ways.
You can choose a deeper exploration. For example, if you include additional covariates in a regression model to account for the impact of those variables. You may earn 1 or 2 additional points for a deeper exploration.
You can explore additional problems. Each problem is worth 3 points.