ENG335: Machine Learning |
Questions:
Question 1
(a) Read about the term “Explainable AI”. In your own words explain this term and appraise the need for it.
(b) Read about Waze carpool from https://cloud.google.com/blog/products/ai-machinelearning/how-waze-predicts-carpools-using-google-cloud-ai-platform.
(i) Formulate the problem statement that the AI platform is trying to solve and how it is being used for the carpool.
(ii) From the information provided in the blog, give the number of parameters and number of records used for training the model. If you think the values are not stated in the article, then state that the values are not available.
(iii) What is the maximum latency provided by the Google AI platform for this application. Provide a numerical value. If it’s not available then state that the latency information is not available.
(iv) In the high-level schema architecture, there is offline and online processing. Explain what you infer from this schema architecture. Specify the algorithms and framework being used.
(v) Is the learning supervised or unsupervised? Explain your answer.
(c) Construct a problem statement relevant to Singapore that can be solved using AI platform and cloud infrastructure offered by AWS or Google or other vendors. You are required to provide the following details:
(i) State the problem/scenario. You can construct your own problem or discuss any AI solution relevant to Singapore.
(ii) List any FOUR (4) parameters that will be present in your dataset used for training.
(iii) Is the learning supervised or unsupervised? If supervised, then provide the target variable.
Question 2
Download the real estate dataset from the Kaggle link
https://www.kaggle.com/quantbruce/real-estate-price-prediction
(a) Perform exploratory data analysis and identify the parameter
(b) Design a linear regressor to predict the price of the house using only TWO
(2) parameters. Specify the linear regression equation obtained from learning the dataset.
Explain what you infer from observing the linear regression equation.
Note: You are required to select the best TWO (2) parameters and justify your selection.
(c) Assess the performance of the linear regressor by getting the relevant performance metrics. You need to provide any THREE (3) metrics and explain the importance of these metrics.
Question 3
Download the Iris dataset from the scikit-learn package.
(a) Perform exploratory data analysis and understand the dataset. Select any TWO (2) classes from the dataset. Implement a suitable algorithm from what you have learned in the class for predicting the target in the Iris dataset.
(b) Implement a Naïve Bayes classifier for the Iris dataset.
(c) Compare the performance metrics of the algorithm in Question 3(a) and the Naïve Bayes classifier. Does the scaling of the parameters have any impact on the performance? (Justify your answer)
Question 4
Use the breast cancer dataset available in sklearn package. You are required to show the steps in loading the data set, perform exploratory data analysis, identify the algorithm (from what has been covered in the seminars) suitable for detecting breast cancer. Present appropriate performance metrics. You can use the following Python code to load the dataset.