Assignment Task
Introduction to dataset:
The database for this assignment contains modified data from a clinical trial that was conducted in two Sites in North Sudan (1=New Halfa, 2=Gezira). All participants had malaria upon enrolment and were followed up for 42 days, all patients were seen on day 0 (day of enrolment), and day 42. At each visit malaria was diagnosed and parasite density was recorded, as well as malaria species, body temperature and haemoglobin levels. In addition, G6PD* levels were measured with two different machines, an experimental biosensor and by the reference method spectrophotometry. Please note that patients can have one of two malaria species: P. falciparum (Pf) or P. vivax (Pv).
*Not relevant for the assignment: Glucose – 6 – Phosphate Dehydrogenase (G6PD) is an enzyme that is essential for the human body to generate energy and maintain its pH levels in the blood. People with low levels of this enzyme can develop severe side effects to standard Pv treatment, determining G6PD levels before treatment is essential.
Whenever needed variable names are indicated in squared brackets [variable]
1a. Generate the variable Body Mass Index and label it accordingly
The formula for the Body Mass Index (BMI) is:
Weight in kg / (Height in meters)2
1b. A BMI
1c. Are most underweight participants male or female? please provide proportions.
2a Males and females were infected with one of two malaria species [Species], Pf or Pv, do proportions of species differ between males and females?
2b: Do proportions differ between sites? Present proportions as fractions (in%) per site on day0
3. Calculate the correct measure of central tendency (mean or median) to describe BMI for males and females separately and assess if the BMI differs significantly between males and females
4a. Haemoglobin levels (Hb) are an indicator for iron deficiency and potential blood loss. Hb was measured on day of enrolment [D0Hb] and the last day of follow up [D42Hb] and is reported as g/dL. Assess if Hb differed significantly between the two time points and provide the result of your analysis in one sentence. Keep in mind that you are looking at repeated measurements within the same individual.
4b. Do Hb values on day 0 [D0Hb] differ between males and females [Sex]?
5a. Develop a model to predict body weight, that only contains significant predictors. Initially consider the following independent variables:
Sex
Site
Height
Age
HEPC
Species
Present your first as well as the final model by copying the respective command and the output as a picture from Stata to this document.
5b. What approach did you choose for variable selection? Justify your approach.
5c. Is your model significant? Present your p – value
5d. In your own words, explain what the R2 value represents
5e. Which independent variable is the greatest predictor for weight? Explain the impact the predictor has on weight if all other variables remain constant.
5f. How does weight differ with age if all other variables remain constant?
5g. What method would you have chosen to predict HEPC (Presence or absence of hepatitis C infection) and why?
6a. Compare the risk for a hepatitis c infection [HEPC] between males and females. What are the risks for each sex [Sex] and do they differ significantly? Justify the measure you choose, remember that all variables must be coded as 0 and 1, if necessary, create a new variable rather than recoding an existing one.
6b. Assume all participants had been in another observational study assessing hepatitis C status from birth onwards [HEPC], the period of observation is hence equivalent to age [Age]. What is the incidence rate for males and females, and do they differ significantly? Make sure to include the correct units in your reply.
6c In your own words describe the concept of person years of observation.
7a. In the course of this study a rapid diagnostic test [RDT] was evaluated for the diagnosis of hepatitis C. The hepatitis C diagnosis is confirmed by a gold standard [HEPC]. Calculate the performance of the RDT and paste the Stata output below.
7b. In less than 100 words, provide a brief overview of the four essential indicators that describe the performance of a test.
8a. G6PD activity was measured by spectrophotometry [G6PDspec – the reference method] and an experimental biosensor [G6PDBS], calculate the correlation between both variables.
8b. create a scatter plot for both measures, make sure to label axis and add a title
8c Using an adequate format, visualize the distribution of G6PD activity as measured by spectrophotometry [G6PDspec]
8d. Show the distribution of Age by Sex using the most appropriate graph.
9a. You are now planning a new case control study to compare the effect of smoking on the chance of a fatal outcome due to COVID 19 infection. You assume that 20% of all controls (those who survive) smoke, the odds ratio between cases and controls for smoking is 1.95. Since you find it tricky to identify sufficient cases (deaths), you plan to enrol two controls per case. In order for your results to be credible you would like to get results with 95% confidence and 80% power (two sided). Assume 0 for correlation between cases and controls but increase your sample size by 10% in order to accommodate for procedural errors. What sample size will you need?
9b. Redo the same calculation, however considering 90% power instead
Introduction to dataset for Question 10:
Evidence is emerging of a noticeable wane in COVID-19 immunity six months after vaccination.
The Deaths sheet in Israel.xlsx contains weekly data for COVID-19 deaths in Israel, by age group (0-60 and >60) and vaccination status.
Vaccination status has been coded as
1=vaccinated, no booster
2=unvaccinated
3=vaccinated, with booster
10a. Import the data for this question from the provided .xlsx.
10b. Convert datetostr to a number date, and format appropriately.
10c. Create a variable containing the month component of the date.
10d. Tabulate number of deaths by age group and month, showing column totals.
10e. Keep only the rows for the >60 age group, and drop the deaths variable.
10f. Reshape the data, transposing mortality from a row to a column variable, so the data looks like this:
agegrpdatetomort_per_ 1mpop_1 mort_per_ 1mpop_2 mort_per_ 1mpop_3 yrwkdatetostrmon
>60 15-Jul-21 0.9 0.7 0 2021 28 15/07/2021 7
>60 22-Jul-21 1 0.7 0 2021 29 22/07/2021 7
10g. Create a line graph showing mortality over time for each of the vaccination categories. Use the labels “Vaccinated, no booster”, “Unvaccinated” and “Vaccinated, with booster” for the legend.
This PHM553 – Statistics Assignment has been solved by our Statistics experts at Schooling Best. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.
Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.