Part A: (15 Marks) Sampling methods (group component) Data cleaning (3 marks) Students will produce a stratified, and one of quota, systematic or simple random sampling should be applied sample using a supplied python script Students explain how their sample relates larger dataset (2 marks)

Your Task
Complete parts A and B below in class in week 6 and Part C must be complete 72 hours after your class finishes. Consider the rubric at the end of the assignment for guidance on structure and content.
Submit results as below:
• In class: Group sampling work is to be submitted as a software file (e.g. excel, Power BI) via MyKBS at 1.5 hours into class time. You do not have to submit the python script that you will be given in addition to the other resources.
• In class: Individual draft of data mining report is to be emailed to your lecturer at the end of class.
• Via Turnitin 72 hours after your class finishes: Final individual report of data mining
Assessment Description
• Business Problem: Suppose that you are a data analyst for the International Federation of Association Football (FIFA). You want to report on the statistics of world football players, given some of the many variables that are collected on them. You don’t, however, want to use the entire data set, so you decide to clean the data, take a sample and report on that.
• Data sets: your teacher will provide you with data on the day.
• Learning outcomes: LO2, LO3, LO4
Assessment Instructions
Part A: Group component (1.5 hours, 15 Marks) In Microsoft Excel or Power BI.
1. Open the data file. Perform some basic data cleansing (removing missing or incorrect values, eg. Age = 0, missing country of origin, and other errors), (3 marks)
2. Recall the sampling methods below that you have learnt about in lectures.
You will be given a script to enable you to run quota, systematic, simple random sampling and stratified sampling method on the data.
Apply one of the following sampling methods (quota, systematic, or simple random) to obtain a subgroup of 5000 rows.
3. In a paragraph of approx. 200 words,
a) explain this type of sample relates to the larger dataset, (2 marks)
b) provide a simple summary (statistical or visual) of three variables using the sampled data set. (3 marks)
4. Apply the stratified sampling method to the data using the script provided by your teacher. In a paragraph of approx. 200 words,
a) Provide a simple summary (statistical or visual) of the same three variables that you chose in 3b), given this new sample. (3 marks)
b) Interpret and compare the results of part 3 and 4. Also explain the limitations of the method chosen. (4 marks)
********************Submit your software file with the visualisation*********************
Individual Components (20 Marks)
Part B: Individual draft (1.5 hours (in class), 10 Marks) Use one of the sampled datasets from Part A for this exercise.
a) Select six diverse variables from the data. These variables must differ from those that the group chose in part A.
b) Start to create visualisations and summary statistics.
c) Submit a draft of your work in at the end of class, as you will continue this exercise in part C below.
Part C: Individual final report (72 hours outside of class, approximately 400 words, 10 Marks)
a) In a paragraph of approximately 300 words, interpret your results and visualisations. [6 marks]
b) List an advantage and possible disadvantage of the sampling method that you chose for this exercise. [2 marks]
c) Explain the difference between non-probability and probability sampling. [2 marks]
d) Submit via Turnitin within 72 hours of the class
Assessment Marking Guide
Section Criteria NN (Fail)
0%-49% P (Pass)
50%-64% CR (Credit) 74%-65% DN (Distinction) 75%-84% HD (High
Incorrect sampling
Parts missing
Little or no explanation of results or method Basic requirements met
Summary brief and general
May be a poor comparison All parts present relevant
explanation and well summarized
Good comparison
May lack some detail All parts present and detail provided on
methods and summary
Solid and relevant
Well thought out summary and comparisons All parts present and well integrated group
Deep detail provided on methods and
Novel and engaging summary
Part B:
(10 Marks) (Individual component in class) Students must select five diverse variables (which differ from group ones) and start to visualize and summarise the data.
A draft report must be emailed at the end of class Nothing started in class
Variables not different or diverse
Parts missing
Basic requirements met with five new variables and
visualisations started
May not be that
Five new variables and
visualisations started
Diversity evident
Five new variables and
visualisations started
Diversity evident
Start of good
Five very diverse variables chosen and
visualisations started
Diversity evident
Start of well integrated interpretation
Page 4 Kaplan Business School Assessment Outline
Part C:
(10 Mark

(Individual component at home) s) Good r divers eport on five new
e variables and
detailed relevant explanations of theory
All consistent with class
Evidence of extra work done to improve class work, make the report flow well and complete Engaging report on five new diverse variables and detailed, well integrated, relevant explanations
of theory done
Excellent flow of report
All consistent with class work, complete and evidence of extra work to provide a novel approach
A well polished report, consistent with the class work, must be
done (6 marks)
Answers to theoretical sections should be evident (2 + 2 = 4 marks) Parts missing
Final report different from draft and too general
No theory discussed
No effort at home Basic report on five new variables and brief explanations of theory done
All consistent with class work
May be small improvement to class work Good report on five new variables and full and relevant explanations of theory done
All consistent with class work and clearly tried to improve and complete most of the class work done
Page 5 Kaplan Business School Assessment Outline
