Data Analytics & Consumer Insights Individual Assessment Report | SETU
Assessment Report
| Programme(s) | MSc in Digital Marketing CWB07
PGDip in Digital Marketing Analytics CWB08 |
| Module | Data Analytics & Consumer Insights |
| Lecturer | Dr Denise Earle |
| Assessment Type | Report |
| MLOs Assessed | Module Learning Outcomes |
| 1. Critically reflect on, evaluate and apply key descriptive analysis techniques when carrying out analysis of digital marketing data. | |
| 2. Critically reflect on, evaluate and apply key inferential statistical techniques when carrying out analysis of digital marketing data. | |
| X | 3. Critically reflect on, evaluate and apply data visualisation best practices using key software tools. |
| X | 4. Critically reflect on, evaluate and apply key software tools for carrying out advanced statistical/data mining techniques for digital marketing data. |
| X | 5. Synthesise, evaluate, critically reflect on and communicate consumer insights so as to strategically apply them to organisations in a marketing context. |
| Knowledge Assessed | R, Segmentation, Clustering, Propensity Modelling, Predictive Analytics, Classification |
| Team / Individual | Individual |
| Date | 11:59pm Tuesday 21rd December |
| Overall Weighting | 50% |
| Late Submission Details | Late submissions will be dealt with in accordance with the SETU Carlow policy:
|
Instructions
- Please use R/Rstudio and Quarto to create a HTML file containing your code, output and interpretation/commentary required to answer the following questions.
- Your code must adhere to Coding Best Practices (see document on Blackboard).
- Your graphs and tables must adhere to Data Visualisation Best Practices.
Submission Details
- Email your Quarto script (the .Qmd file) to earle@setu.ie by the deadline indicated above.
- You must also send the Rpubs link to your published report.
Part A – Exploratory Analysis of the Bank Product Dataset
A bank wants to identify what drives certain customers to churn from their easy-saver account product. The bank collected data about their customers who held an easy-saver account over the past 24 months. For each customer, they recorded whether the customer churned from the product. They also recorded the following independent variables, which are saved in the bank_training.csv (containing 8000 customers) and bank_testing.csv (containing 2000 customers) datasets.
- Customer_id: customer identifier.
- Credit_score: the customer’s credit score, which is a measure of the person’ likelihood to repay a debt.
- Geography: the country where the customer lives.
- Age: the customer’s age.
- Tenure: the number of years the customer has been with the bank.
- Balance: the customer’s current account balance on the date the churned from the easy-saver product.
- Num_products: the number of products the customer has signed up to with the bank. For example, a product could be a current account, saving account, credit card, debit card, mortgage, mortgage protection insurance, etc…
- Has_credit_card: indicates whether the customer has a credit card with the bank.
- Estimate_salary: the customer’s estimated salary.
- Churn: indicates whether the customer churned (i.e. left) the bank’s easy-saver product.
- Import the bank_training.csv and bank_testing.csv datasets into R.
- Using the bank_training.csv dataset, carry out a visual exploration of the data to understand the relationship between whether a customer churns (variable called “churn”) and each of the other potential predictor variables.
- Your graphs should adhere to Data Visualisation Best Practices.
- All graphs must be interpreted and commented upon.
Part B – Predicting Customers Who Will Churn from a Bank Product
- Following on from your visual exploration in Part A, create and visualise a classification tree model that will allow you to predict if a customer will churn.
- Make sure to include all possible predictor variables when creating the classification tree model.
- Interpret the classification tree:
- Clearly state one rule for predicting if a customer will churn. Your answer should also address how pure the node is.
- Clearly state one rule for predicting if a customer will not churn. Your answer should also address how pure the node is.
- Which variables are considered important for predicting if a customer will churn? Explain your answer.
- Compare the results of your visual exploration in Part A to your findings in Part B.2.c. Are there variables that appeared important from the visual exploration for predicting if a customer will churn, but that ended up not being important in the classification tree? And vice-versa, are there variables that appeared unimportant from the visual exploration, but that ended up being important in the classification tree?
- Fully assess the accuracy of the classification tree using both the training and the testing datasets.
- Based on your findings, do you think the classification tree is overfitting the training dataset? Explain your answer.
- Based on your analysis, suggest some actions the company could take to improve their churn rate. How could your classification tree model be used for marketing purposes?
- Ensure your suggested actions relate directly to the dataset and analysis
- completed as part of this assignment.
Part C – Segmenting Bank Customers
A well-known bank is trying to increase their profits. They have identified that most of their customers have saving deposit accounts, which are not profitable for the bank. However, loans are much more profitable and so they wish to create a marketing campaign that will encourage customers to take out a personal loan. Your goal is to create customer segments for the marketing campaign that will help the bank optimise their communications with their customers.
The bank_personal_loan.csv file contains the following variables for 5,000 customers:
- id: unique identifier for each customer.
- age: customers age in years.
- experience: the work experience of the customer in years.
- income: estimated annual income of the customer (in €000s)
- cc_avg: average spending on credit cards per month (in €000s)
- personal_loan: this variable indicates if the customer has taken out a loan previously. It is reasonable to assume that if a customer took out a loan in the past, then they are likely to take out another loan in the future
Tasks
- Import the bank_personal_loan.csv file into R.
- You must cluster the customers based on age, experience, income and cc_avg only (i.e. do not include the id and personal_loan variables in the clustering process). Create a sub-dataset containing only these variables.
- Does the data need to be scaled before computing the distance matrix containing the Euclidean distance between all pairs of variables? Explain your answer.
- Create the Euclidean distance matrix using the variables contained in the sub-datset created in Question 2.
- Carry out a hierarchical clustering using the hclust You do not need to specify a linkage method.
- Create a 3-cluster solution using the cutree function and assess the quality of this solution.
- Profile the clusters making sure to comment on the size of each cluster, the average age, average experience, average income and average credit card spend. Support your commentary with suitable tables and graphs. Ensure all graphs and tables follow Data Visualisation Best Practices.
- Write a short paragraph summarising the profile for each cluster/segment.
- Which segment(s) is(are) most likely to take out a personal loan in the future? Hint: profile the clusters/segments using the personal_loan variable.
The post Data Analytics & Consumer Insights Individual Assessment Report | SETU appeared first on Ireland Assignment Helper.